lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
362 stars 59 forks source link

Weighted Dependent Variable Mean In etable #409

Open JackLandry opened 1 year ago

JackLandry commented 1 year ago

Currently, using the argument fitstat = c("my") in etable gives the unweighted dependent variable mean, even if the estimated regression is using weights. I think in most situations, users would want a weighted dependent variable mean if they are running regressions with weights. I don't think there is any option to do this, so it would be great if one could be added. (Really amazingly flexible package otherwise).

I realize I could hypothetically add the weighted mean as an option using fitstat_register, but by quick look at that it seems at the very least not straightforward (and maybe not even possible) to compute the weighted mean from a fixest estimation. If there is a way to add the weighted mean option using fitstat_register? And if not, is the best path forward in the near term to do some manual work with the extra_lines argument?

(Edited to add second paragraph, prematurely posted trying to add a new line)

lrberge commented 1 year ago

Apologies in advance for the terse answer (it's not yet the time I'm back on the package). TL;DR: your comment makes sens, I'll look into it. Here's something that should work.

library(fixest)

base = setNames(iris, c("y", "x1", "x2", "w", "species"))
# make the sample 'dirty' to show it works
base$x1[4:8] = NA
est = feols(y ~ x1, base, weights = ~w)
est_no_w = feols(y ~ x1, base)

# function to compute the weighted mean
dep_mean_weighted = function(x){
  # NOTA: I may create a `depvar` function which is more intuitive 
  #       (I don't know why it does not yet exist in the `stats` package [or maybe 
  #        it does but I don't know the function name!])
  y = model.matrix(x, type  = "lhs")
  if(!"weights" %in% names(x)){
    return(mean(y))
  }

  # NOTA: I'll add an argument to weights.fixest governing which sample to return
  #       to avoid subsetting it + the default will be to have unitary weights
  w = weights(x)[obs(x)]

  sum(w * y) / sum(w)
}

# extra function of interest:
obs_weighted = function(x){
  if(!"weights" %in% names(x)){
    return(nobs(x))
  }

  sum(weights(x)[obs(x)])
}

# registering it
fitstat_register("wmy", dep_mean_weighted, "Mean DV (weighted)")
fitstat_register("wobs", obs_weighted, "Observations (weighted)")

# summoning them
etable(est, est_no_w, fitstat = ~. + wobs + my + wmy)
#>                                        est          est_no_w
#> Dependent Var.:                          y                 y
#> 
#> Constant                 4.643*** (0.4787) 6.399*** (0.4856)
#> x1                      0.5547*** (0.1610)  -0.1722 (0.1580)
#> _______________________ __________________ _________________
#> S.E. type                              IID               IID
#> Observations                           145               145
#> R2                                 0.07668           0.00824
#> Adj. R2                            0.07022           0.00131
#> Observations (weighted)             178.60               145
#> Dep. Var. mean                      5.8752            5.8752
#> Mean DV (weighted)                  6.2804            5.8752
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
JackLandry commented 1 year ago

Works beautifully, thank you so much!