lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
379 stars 60 forks source link

Feature request: More information with coefplots(only.params = TRUE)$prms #137

Open adamaltmejd opened 3 years ago

adamaltmejd commented 3 years ago

Hey!

Running a model with both multiple outcome variables and fplit and its hard to figure out which "id" in coefplot(only.params = TRUE) is from which model. Would love if you could add columns with the name of the dependent variable and perhaps the name of the split cell (if split is used).

One reason its confusing is that I'm not sure how to find each model when running both multiple y and split:

data(base_did)
DT <- data.table(base_did)
DT[, y2 := y^2]
fit1 <- feols(c(y, y2) ~ x1 + i(treat, period, 5) | id + period, data = DT, lean = TRUE)
fit2 <- feols(y ~ x1 + i(treat, period, 5) | id + period, data = DT, fsplit = ~ post, lean = TRUE)
fit3 <- feols(c(y, y2) ~ x1 + i(treat, period, 5) | id + period, data = DT, fsplit = ~ post, lean = TRUE)
> names(fit1)                                   
[1] "y"  "y2"
> names(fit2)                                   
[1] "Full sample" "0"           "1"     
> names(fit3)                                   
[1] "Full sample" "0"           "1"     

So both fit2 and fit3 are equally long. When i print fit3 all 6 estimations show up, and length(fit3) is 3. So I would have guessed it was a nested list. But fit3[1] prints only one outcome variable, the second is in fit3[2], and actually fit3[6] outputs the last. Best way to identify the results in coefplot(only.params) is to follow these ids but its messy.

adamaltmejd commented 3 years ago

Now I'm using the following to extract the dep var:

> sapply(fit3, function(x) all.vars(x$fml)[1])  
[1] "y"  "y2" "y"  "y2" "y"  "y2"
lrberge commented 3 years ago

Hi, that's something I'll improve natively in coefplot (I'll change many things actually) but I can't at the moment.

One way to find out exactly which model it is is to use summary with the compact option:

summary(fit1, "compact")[, 1:5]
#>   lhs                x1 treat:period::1 treat:period::2 treat:period::3
#> 1  y  0.973*** (0.0457)     -1.4 (1.11)    -1.25 (1.09)   -0.273 (1.11)
#> 2  y2   4.4*** (0.685)     -1.22 (10.1)   -0.764 (11.2)    -3.43 (9.73)
summary(fit2, "compact")[, 1:5]
#>   lhs      sample                x1 treat:period::1 treat:period::2
#> 1   y Full sample 0.973*** (0.0457)     -1.4 (1.11)    -1.25 (1.09)
#> 2   y 0           0.988*** (0.0678)    -1.39 (1.11)    -1.24 (1.09)
#> 3   y 1               1*** (0.0678)                                
summary(fit3, "compact")[, 1:5]
#>        sample lhs                x1 treat:period::1 treat:period::2
#> 1 Full sample  y  0.973*** (0.0457)     -1.4 (1.11)    -1.25 (1.09)
#> 2 Full sample  y2   4.4*** (0.685)     -1.22 (10.1)   -0.764 (11.2)
#> 3 0            y  0.988*** (0.0678)    -1.39 (1.11)    -1.24 (1.09)
#> 4 0            y2   2.13** (0.788)     -2.65 (10.1)    -1.74 (11.1)
#> 5 1            y      1*** (0.0678)                                
#> 6 1            y2  7.11*** (1.09) 

That's not super handy I know... I've been thinking for a while to add a function giving the structure of the fixest_multi object, but I'm still struggling to find a proper name. Or maybe instead of compact it could be the structure option?

adamaltmejd commented 3 years ago

If its a list couldn't you use element names that convey both lhs and split?

lrberge commented 3 years ago

I don't see what you mean. An example?

adamaltmejd commented 3 years ago

Basically that names(fit3) would produce something like c("Full sample - y", "Full sample - y2", "0 - y", "0 - y2", "1 - y", "1 - y2"). But maybe that would just be too confusing. I think part of the problem though for me was that I didn't even realize there was a fit3[5] since both length() and names() indicated only 3 elements.

adamaltmejd commented 3 years ago

The summary() with compact is super useful though, but I guess this all depends on the use case. Mine is a function that produces multiple different coefplots with ggplot where I split by different variables and use multiple outcomes. Thus I want to programatically label the right coefs which is why I was looking for the labels in coefplot(only.params=T). And the second place I would look is names(fit3).

lrberge commented 3 years ago

Thanks for clarifying!

First, this information should be eventually in coefplot since I'll use that information too.

I completely understand your confusion about the length and names, and the fact that the fixest_multi_object[digit] method can go up to 6. I 100% agree it's not intuitive and should be changed. Also the fact that [digit] returns single objects is against R standards which is not good. So far I enforce the object to be a strongly regular nested list but I think it's a limitation and I'll drop that in the future -- so I need to change how objects are accessed anyway.

Regarding your specific request, I don't know what would be the best to get the identifiers. I'm not sure names() is the best way because several dimensions would need to be coerced into a single vector. In the meantime, the following simple function does that:

my_names = function(x, sep = " -- "){
    xx = summary(x, "compact")
    root_list = c("lhs", "rhs", "sample", "iv", "fixef")
    keep = colnames(xx) %in% root_list
    new_names = apply(xx[, keep, drop = FALSE], 1, paste, collapse = sep)
    gsub(" +", " ", new_names)
}
my_names(fit1)
#> [1] "0 -- y " "0 -- y2" "1 -- y " "1 -- y2"
my_names(fit2)
#> [1] "y -- Full sample" "y -- 0 "          "y -- 1 "         
my_names(fit3)
#> [1] "Full sample -- y " "Full sample -- y2" "0 -- y "           "0 -- y2"           "1 -- y "           "1 -- y2"  

I'll let you know when it's in coefplot.