lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
361 stars 59 forks source link

Feature request: VCV for multiple outcomes #424

Open jonathandroth opened 1 year ago

jonathandroth commented 1 year ago

Laurent -- thanks again for the fantastic package!

I think it would be great if fixest could return the joint variance-covariance matrix when estimating multiple regressions at once. I.e., I run

feols(c(y1,y2) ~ x,...)

Now I'd like to know the covariance between the coefficient on x when using y1 as an outcome and y2 as an outcome. (This is what's sometime called seeming unrelated regression.) There are some wrap-arounds available (e.g. here), but it would be great if this could easily be done natively.

kylebutts commented 12 months ago

@jonathandroth, here's a basic version (no clustering). Two questions and I can make this more robust:

  1. What is the correct way to account for degrees of freedom? Is it sum of number of rows in each est - sum of number of covariates in each est?
  2. Would clustered standard errors be useful? I believe you would do sum $X_i' e_i$ for each g and premultiply by $(X'X)^{-1}$ to get a $N_g$ by $k$ matrix of influence functions (instead of N by k).
library(fixest)

#' Get joint variance-covariance matrix of seemingly unrelated regressions
#' 
#' @param ests List of `feols` estimates.
#' 
#' @return Variance-covariance matrix of the combined set of coefficients in
#' all models in `ests`.
#' 
vcovSUR <- function(ests) {
  if (inherits(ests, "fixest") | inherits(ests, "fixest_multi")) { 
    ests = list(ests)
  } 

  inf_list <- lapply(ests, get_influence_func)
  inf_funcs <- do.call("rbind", inf_list)

  vcov = tcrossprod(inf_funcs)
  return(vcov)
}

get_influence_func <- function(est) {
  # (X'X)^{-1} X'e
  if (inherits(est, "fixest_multi")) {

    inf_list = lapply(est, function(x) { 
      (x$cov.iid / x$sigma2) %*% t(x$scores)
    })
    do.call("rbind", inf_list)

  } else if (inherits(est, "fixest")) {

    (est$cov.iid / est$sigma2) %*% t(est$scores)

  } else { # code taken partially from `sandwich`

    X <- stats::model.matrix(est)
    if (any(alias <- is.na(stats::coef(est)))) X <- X[, !alias, drop = FALSE]
    if (!is.null(w <- stats::weights(est))) X <- X * sqrt(w)
    scores = X * stats::resid(est)
    solve(crossprod(X), t(scores))

  }
}

mtcars$w = runif(nrow(mtcars), 0.15, 0.85)
est1 <- feols(mpg ~ wt, mtcars)
est2 <- feols(mpg ~ wt + i(cyl), mtcars)
est3 <- lm(mpg ~ wt + i(cyl), mtcars)
est_w <- feols(mpg ~ wt + i(cyl), mtcars, weights = ~w)
est_multi <- feols(c(mpg, hp)  ~ wt + i(cyl), mtcars)

vcov(est1, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
#>             (Intercept)        wt
#> (Intercept)    4.516833 -1.305717
#> wt            -1.305717  0.401577
vcov(est2, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
#>             (Intercept)         wt     cyl::6     cyl::8
#> (Intercept)   3.2700821 -0.9321293 -0.3715946  0.3477177
#> wt           -0.9321293  0.3783737 -0.2442125 -0.5329499
#> cyl::6       -0.3715946 -0.2442125  1.2716154  1.3082462
#> cyl::8        0.3477177 -0.5329499  1.3082462  1.9913114
vcovSUR(list(est1, est2))
#>             (Intercept)          wt (Intercept)         wt     cyl::6
#> (Intercept)   4.5168333 -1.30571703   3.3911740 -0.8658345 -0.6364905
#> wt           -1.3057170  0.40157696  -0.9446715  0.2607989  0.1249679
#> (Intercept)   3.3911740 -0.94467154   3.2700821 -0.9321293 -0.3715946
#> wt           -0.8658345  0.26079887  -0.9321293  0.3783737 -0.2442125
#> cyl::6       -0.6364905  0.12496790  -0.3715946 -0.2442125  1.2716154
#> cyl::8       -0.2795933  0.06152524   0.3477177 -0.5329499  1.3082462
#>                  cyl::8
#> (Intercept) -0.27959331
#> wt           0.06152524
#> (Intercept)  0.34771768
#> wt          -0.53294991
#> cyl::6       1.30824624
#> cyl::8       1.99131136

# Works with weights
vcov(est_w, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
#>             (Intercept)          wt      cyl::6     cyl::8
#> (Intercept)   5.0756301 -1.12152509 -1.59657428 -0.4062194
#> wt           -1.1215251  0.33655783  0.07906278 -0.2880801
#> cyl::6       -1.5965743  0.07906278  1.47597744  1.2937823
#> cyl::8       -0.4062194 -0.28808008  1.29378231  1.9053256
vcovSUR(est_w)
#>             (Intercept)          wt      cyl::6     cyl::8
#> (Intercept)   5.0756301 -1.12152509 -1.59657428 -0.4062194
#> wt           -1.1215251  0.33655783  0.07906278 -0.2880801
#> cyl::6       -1.5965743  0.07906278  1.47597744  1.2937823
#> cyl::8       -0.4062194 -0.28808008  1.29378231  1.9053256

# Works with multi
lapply(est_multi, function(est) vcov(est, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE)))
#> $`lhs: mpg`
#>             (Intercept)         wt     cyl::6     cyl::8
#> (Intercept)   3.2700821 -0.9321293 -0.3715946  0.3477177
#> wt           -0.9321293  0.3783737 -0.2442125 -0.5329499
#> cyl::6       -0.3715946 -0.2442125  1.2716154  1.3082462
#> cyl::8        0.3477177 -0.5329499  1.3082462  1.9913114
#> 
#> $`lhs: hp`
#>             (Intercept)         wt    cyl::6    cyl::8
#> (Intercept)    481.0195 -189.14317 135.81828  479.2033
#> wt            -189.1432   80.30526 -73.10224 -221.1762
#> cyl::6         135.8183  -73.10224 174.73028  230.6635
#> cyl::8         479.2033 -221.17619 230.66346  730.4108
vcovSUR(est_multi)
#>             (Intercept)         wt     cyl::6      cyl::8  (Intercept)
#> (Intercept)   3.2700821 -0.9321293 -0.3715946   0.3477177   -3.6102659
#> wt           -0.9321293  0.3783737 -0.2442125  -0.5329499    0.2846401
#> cyl::6       -0.3715946 -0.2442125  1.2716154   1.3082462    2.1105066
#> cyl::8        0.3477177 -0.5329499  1.3082462   1.9913114   -1.6623714
#> (Intercept)  -3.6102659  0.2846401  2.1105066  -1.6623714  481.0194872
#> wt            0.2846401 -0.2300793  0.7005150   2.4442424 -189.1431698
#> cyl::6        2.1105066  0.7005150 -6.1116792  -6.4158361  135.8182788
#> cyl::8       -1.6623714  2.4442424 -6.4158361 -13.5626121  479.2033017
#>                       wt     cyl::6      cyl::8
#> (Intercept)    0.2846401   2.110507   -1.662371
#> wt            -0.2300793   0.700515    2.444242
#> cyl::6         0.7005150  -6.111679   -6.415836
#> cyl::8         2.4442424  -6.415836  -13.562612
#> (Intercept) -189.1431698 135.818279  479.203302
#> wt            80.3052563 -73.102243 -221.176193
#> cyl::6       -73.1022434 174.730281  230.663464
#> cyl::8      -221.1761929 230.663464  730.410817

Created on 2023-07-10 with reprex v2.0.2

kylebutts commented 12 months ago

@grantmcdermott @vincentarelbundock, is there a way to call marginaleffects::hypothesis providing a vector of coefficients and the variance-covariance matrix?

vincentarelbundock commented 12 months ago

@kylebutts maybe the FUN argument in hypotheses() can help. (Not sure.)

jonathandroth commented 12 months ago

Thanks, @kylebutts !

Re d.o.f., Stata's sureg command has the following options:

image

Yes, I do think clustered SEs would be a nice add-on.

kylebutts commented 12 months ago

@vincentarelbundock That worked with defining custom get_coef and set_coef methods! Thanks!

vincentarelbundock commented 12 months ago

Cool! Is that too niche, or is it something that should be merged in the package, you think?

kylebutts commented 12 months ago

@jonathandroth See https://github.com/kylebutts/vcovSUR :-)

✔️ Variance-covariance matrix estimation ✔️ Clustering (which I realized is necessary for clustering by observation id) ✔️ Support for fixest_multi objects ✔️ Hypothesis Tests from {marginaleffects} ✖︎ No small-sample degree of freedom correction (yet).

jonathandroth commented 11 months ago

Awesome! My own 2 cents is it would be useful to have a native fixest feature for doing hypothesis tests when using multiple outcomes, but I'll leave it up to the package development experts here. Either way this will be a great resource

On Wed, Jul 12, 2023 at 11:18 AM Kyle F Butts @.***> wrote:

@jonathandroth https://github.com/jonathandroth See https://github.com/kylebutts/vcovSUR :-)

✔️ Variance-covariance matrix estimation ✔️ Clustering (which I realized is necessary for clustering by observation id) ✔️ Support for fixest_multi objects ✔️ Hypothesis Tests from {marginaleffects} ✖︎ No small-sample degree of freedom correction (yet).

— Reply to this email directly, view it on GitHub https://github.com/lrberge/fixest/issues/424#issuecomment-1632736446, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFCGYE76O27U4ZRVGWDXP252VANCNFSM6AAAAAAZR2EXXE . You are receiving this because you were mentioned.Message ID: @.***>