Closed vronizor closed 3 years ago
Here is a mwe:
library(data.table)
library(fixest)
DT = as.data.table(airquality)
est = feols(Ozone ~ Solar.R + Wind + Temp | Month + Day,
DT, cluster = ~Day)
#> NOTE: 42 observations removed because of NA values (LHS: 37, RHS: 7).
# Method 1
DT[, used := TRUE]
DT[na.action(est), used := FALSE]
#> Null data.table (0 rows and 0 cols)
# Method 2
esample = rownames(as.matrix(resid(est)))
DT[esample]
#> Null data.table (0 rows and 0 cols)
Created on 2021-05-13 by the reprex package (v2.0.0)
I noticed the NOTE: 42 observations removed because of NA values (LHS: 37, RHS: 7).
, so there might be a way to retrieve that info :)
Hi vronizor, I am also new to R and I had to do this today. I found the "obsRemoved" entry in the est object returned by feols can be used to retrieve the sample as follows (retrieves all collumns): sample <- DT[-est[["obsRemoved"]],]
or for just a list of rows used it should be sample <- setdiff(1:est[["nobs_origin"]], est[["obsRemoved"]])
Thanks @24thronin, works perfectly! I need to get used to digging into these post-estimation objects, they are very handy!
Hi @vronizor, and thanks John for bringing a solution.
I would just add a note of caution: the obsRemoved
only considers NA values or obs. removed due to only 0 outcomes in fixed-effects for non linear models (in Poisson for instance).
This means that it does not contain observations removed due to: a) the subset
argument, b) the split
argument, and c) NA/only-0 in multiple estimations (because a very specific delayed treatment is applied).
But it should work in most cases. To let you know, in 0.9.0 I'll add the obs()
function to get the vector of observations used in the estimation, and it should account for everything.
I'm happy you found a solution and sorry for the delay!
Was just looking for this -- looking forward to obs()
:), will it work even if lean = TRUE
?
@adamaltmejd: of course not! :-D The information is possibly of order n so it is removed.
But in the long run, any command will work even if lean = TRUE
, it's only processing time that will be longer.
Hi @lrberge, thanks a lot for the great package!
Disclaimer: I am new to R and this might be a more general question on regressions in R, sorry if it doesn't belong here.
Coming from Stata, I'm used to the
e(sample)
command which lets the user identify, post-estimation, the sample used to run the regression. This can be useful to then compute the average of the dependent variable for the control group included in the estimation, for example.I haven't found a way to do that with
fixest
. I've tried several proposed solutions but always ended up with aNULL
result for an estimation I knew used only part of the full sample.Is this at all possible? Might it be that the objects returned by
lm
as given in the links above are not the same as the ones returned byfeols
?