lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
378 stars 61 forks source link

Feature request: ability to combine fixest_multi objects #342

Closed turbanisch closed 2 years ago

turbanisch commented 2 years ago

I think it would be great if several fixest_multi objects could be combined into a new fixest_multi object, preserving the ability to use all the nice sorting and filtering functionality of its [ method.

In this toy example, I would like to estimate both OLS and PPML for the same three subsets and sort the final regression output with respect to sample but the combined object is not of class fixest_multi any longer.

library(fixest)

# create dummy data
base = iris
names(base) = c("y1", "y2", "x1", "x2", "species")

# run OLS and PPML on (the same) 3 subsamples each
multi_ols = feols(y1 ~ x1 + x2, data = base, split = ~species)
multi_ppml = fepois(y1 ~ x1 + x2, data = base, split = ~species)

# append fixest_multi objects
mods <- c(
  multi_ols,
  multi_ppml
)

# combined object is not of class fixest_multi and method does not work
sloop::s3_class(mods)
#> [1] "list"
mods[sample = 1]
#> $`sample.var: species; sample: setosa`
#> OLS estimation, Dep. Var.: y1
#> Observations: 50 
#> Sample (species): setosa
#> Standard-errors: IID 
#>             Estimate Std. Error  t value   Pr(>|t|)    
#> (Intercept) 4.247508   0.411434 10.32368 1.1341e-13 ***
#> x1          0.398979   0.295774  1.34893 1.8382e-01    
#> x2          0.712135   0.487404  1.46108 1.5065e-01    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.328876   Adj. R2: 0.07393

Created on 2022-07-31 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.0 (2022-04-22) #> os macOS Monterey 12.4 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Berlin #> date 2022-07-31 #> pandoc 2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) #> dreamerr 1.2.3 2020-12-05 [1] CRAN (R 4.2.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> fixest * 0.10.5 2022-06-07 [1] https://fastverse.r-universe.dev (R 4.2.0) #> Formula 1.2-4 2020-10-16 [1] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.0) #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0) #> lattice 0.20-45 2021-09-22 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> nlme 3.1-157 2022-03-25 [1] CRAN (R 4.2.0) #> numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.2.0) #> Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0) #> rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.0) #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0) #> sandwich 3.0-1 2021-05-18 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> sloop 1.0.1 2019-02-17 [1] CRAN (R 4.2.0) #> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0) #> zoo 1.8-10 2022-04-15 [1] CRAN (R 4.2.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
grantmcdermott commented 2 years ago

Laurent should obviously have the last say, but I'm not sure if this is workable without significant code overhaul. fixest_multi follows a strict tree ordering that helps to preserve the logic for downstream function dispatch. I could imagine that arbitrarily layering fixest_multi objects on top of each other might trigger internal confusion (that doesn't arise when combined as lists or simply concatenated).

Is there a particular use case you have in mind? Simply combining them as you have above should still support a lot (most? all?) of the post estimation functions. E.g.

> etable(mods, vcov = 'hc1')                                                                                                                                                       
                 sample.var: spe.. sample.var: spec.. sample.var: spec...1 sample.var: spe...1 sample.var: spec...2 sample.var: spec...3
Sample (species)            setosa         versicolor            virginica              setosa           versicolor            virginica
Dependent Var.:                 y1                 y1                   y1                  y1                   y1                   y1

Constant         4.248*** (0.4737)  2.381*** (0.4227)      1.052. (0.5391)   1.459*** (0.0956)    1.167*** (0.0732)    1.056*** (0.0863)
x1                 0.3990 (0.3253) 0.9342*** (0.1659)   0.9946*** (0.0898)     0.0797 (0.0654)   0.1616*** (0.0280)   0.1474*** (0.0140)
x2                0.7121. (0.4175)   -0.3200 (0.3638)      0.0071 (0.2047)    0.1401. (0.0821)     -0.0576 (0.0607)      0.0039 (0.0307)
________________ _________________ __________________   __________________   _________________   __________________   __________________
Family                         OLS                OLS                  OLS             Poisson              Poisson              Poisson
S.E. type        Heteroskeda.-rob. Heteroskedas.-rob.   Heteroskedas.-rob.   Heteroskeda.-rob.   Heteroskedas.-rob.   Heteroskedas.-rob.
Observations                    50                 50                   50                  50                   50                   50
Squared Cor.               0.11173            0.57432              0.74689             0.10975              0.57405              0.74274
Pseudo R2                  0.16181            0.57126              0.71852             0.00077              0.00695              0.01165
BIC                         42.423             43.785               38.648              186.79               194.83               199.73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
turbanisch commented 2 years ago

Thanks for the comment! It occured to me the minute I made the suggestion that this might be easier said than done when someone tries to combine models that are wildly different from each other.

The use case I had in mind was just about the treee ordering: I would like to sort the six models within etable() first with respect to sample (species) and then with respect to the model family, such that OLS and PPML estimations appear side-by-side for each sample.

grantmcdermott commented 2 years ago

The use case I had in mind was just about the treee ordering: I would like to sort the six models within etable() first with respect to sample (species) and then with respect to the model family, such that OLS and PPML estimations appear side-by-side for each sample.

It's not a single convenience function, but you can "zip" the two fixest_multi objects element-by-element (with Map), flatten (with unlist), and then pass on to etable.

> Map(list, multi_ols, multi_ppml) |>  ## zip
      unlist(recursive = FALSE) |>     ## flatten
      etable()

                 sample.var: sp..1 sample.var: s..2 sample.var: spe..1 sample.var: s..2.1 sample.var: spe..1.1 sample.var: ..2
Sample (species)            setosa           setosa         versicolor         versicolor            virginica       virginica
Dependent Var.:                 y1               y1                 y1                 y1                   y1              y1

Constant         4.248*** (0.4114) 1.459** (0.5541)  2.381*** (0.4493)    1.167* (0.5633)      1.052* (0.5139) 1.056. (0.6234)
x1                 0.3990 (0.2958)  0.0797 (0.3978) 0.9342*** (0.1693)    0.1616 (0.2093)   0.9946*** (0.0893) 0.1474 (0.1062)
x2                 0.7121 (0.4874)  0.1401 (0.6488)   -0.3200 (0.4024)   -0.0576 (0.4902)      0.0071 (0.1795) 0.0039 (0.2184)
________________ _________________ ________________ __________________   ________________   __________________ _______________
Family                         OLS          Poisson                OLS            Poisson                  OLS         Poisson
S.E. type                      IID              IID                IID                IID                  IID             IID
Observations                    50               50                 50                 50                   50              50
Squared Cor.               0.11173          0.10975            0.57432            0.57405              0.74689         0.74274
Pseudo R2                  0.16181          0.00077            0.57126            0.00695              0.71852         0.01165
BIC                         42.423           186.79             43.785             194.83               38.648          199.73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
grantmcdermott commented 2 years ago

@turbanisch were you able to try the Map -> unlist solution? If it works for you, then I think we should close this issue to avoid dev clutter.

turbanisch commented 2 years ago

Absolutely, it does! I feel like in a small regression table, writing out each formula might be slightly more transparent than the map-unlist approach (at least to my eye) but for bigger ones it will certainly come in handy.