sergiocorreia / reghdfe

Linear, IV and GMM Regressions With Any Number of Fixed Effects
http://scorreia.com/software/reghdfe/
MIT License
219 stars 57 forks source link

compact option causes error #194

Closed AliKaro closed 5 years ago

AliKaro commented 5 years ago

I'm running models with a huge amount of fixed effects (only some of them absorbed) and run out of memory. When trying the compact option I always get the error "variable tempID not found". Seems like there is a bug related to preserving datasets when using the compact option.

Simple example model: . reghdfe pun i.per ,absorb(idnr) cluster(kom) compact verbose(4)

Parsing varlist: pun i.per

macros: r(basevars) : "pun per" r(indepvars) : "i.per" r(fe_format) : "%9.2f" r(depvar) : "pun"

Parsing vce(cluster kom)

macros: s(base_clustervars) : "kom" s(clustervars) : "kom" s(num_clusters) : "1" s(vcetype) : "cluster"

Parsing absvars and HDFE options

macros: s(precondition) : "0" s(poolsize) : "." s(compute_rre) : "0" s(dofadjustments) : "pairwise clusters continuous" s(report_constant) : "1" s(G) : "1" s(has_intercept) : "1" s(save_any_fe) : "0" s(save_all_fe) : "0" s(absvars) : " "idnr"" s(ivars) : " "idnr"" s(cvars) : " """ s(targets) : " """ s(intercepts) : " 1" s(num_slopes) : " 0" s(extended_absvars) : "1.idnr"

Initializing Mata object for 1 fixed effects

+-----------------------------------------------------------------------------------+ | i | g | Name | Int? | #Slopes | Obs. | Levels | Sorted? | #Drop Singl. | |----+---+-------+------+---------+-----------+------------+---------+--------------| | 1 | 1 | idnr | Yes | 0 | 7719206 | 2179 | No | 0 | +-----------------------------------------------------------------------------------+

Initializing panelsetup() for each fixed effect

Estimating degrees-of-freedom absorbed by the fixed effects

Saving e(sample)

Parsing and expanding indepvars: i.per

macros: r(not_omitted) : "0 1 1 1 1 1 1 1 1 1 1" r(varlist) : "1bn.per 2bn.per 3bn.per 4bn.per 5bn.per 6bn.per 7bn.per 8bn..." r(fullvarlist) : "0b.per 1.per 2.per 3.per 4.per 5.per 6.per 7.per 8.per 9.per.."

Preserving dataset

variable tempID not found r(111);

sergiocorreia commented 5 years ago

What version of reghdfe, ftools, and Stata are you using?

(which <COMMANDNAME>)

AliKaro commented 5 years ago

I'm using Stata 16, 30 Sept 2019 update reghdfe version 5.7.2, 29jul2019 *ftool version 2.37.0, 16aug2019

sergiocorreia commented 5 years ago

Ok, so the tempID thing got me confused initially (because its not present anywhere on reghdfe or ftools).

The problem likely lies in that the data is xtset but the xtset variable has been dropped. For instance, see this example to understand what's going on:

clear
set obs 8
gen id = 1 + (_n > 4)
bys id: gen t = _n
xtset id t
gen other_id = ceil(runiform() * 4)
gen y = rnormal()
gen x = rnormal()
list, sepby(id)

* Works
reghdfe y x, a(other_id)
reghdfe y x, a(other_id) compact
drop id t
* Now fails
reghdfe y x, a(other_id) compact

The solution is to either type xtset, clear before running reghdfe, or to not drop tempID so the xtset is preserved.

Now, for a deeper solution, I would have to a) detect if there's an xtset (needed in case you have time series operators0, b) detect if the variable is not missing. So far I'm only doing a), but b) can be added.

AliKaro commented 5 years ago

That works! Thanks a lot for the fix and for the great package!