sgaure / lfe

Source code repository for the R package lfe on CRAN.
53 stars 18 forks source link

felm: centering threads hang-up #23

Open averydo opened 5 years ago

averydo commented 5 years ago

Hello, I'm currently running a model on gridded/tabularized spatial data where each row, for example, corresponds to a 5x5 km square area in a given year -- in a given country. I'm running a felm model where I am estimating the outcome of a particular kind of conflict event (ACLED battles, protests, civil unrest, etc.) with country-year fixed effects:

# just an example 
fmla <- "any_acled_battles ~ any_infrastructure + any_electrification | 
cell_number_5x5 + country_year | 0 | cell_number_5x5" %>% as.formula()

felm(fmla, data = my_data)

This model runs perfectly fine at the 5x5 km level. So, my issue with centering threads (demeaning) occurs at the 10x10 km level. When I aggregate up, the model seems to run indefinitely (which is peculiar, because aggregating up would result in less rows/data to process), and I have not attempted to leave it overnight or let it keep going because it drives my CPU temperatures to 90C+ after ~20 minutes (I hear the CPU fans running loudly and I'm on a higher-spec 2017 iMac). When I cancel, the console returns something similar to ".. stopping centering threads.." which I assume is evidence of the process taxing itself by attempting to demean something and getting stuck. On the other hand, it takes about 2 minutes (at most) for 5x5 outputs to complete. Looking through documentation, I've been able to successfully run the 10x10 models if I set tolerance levels to options(lfe.eps = 1e-2). It completes in a reasonable amount of time (similar or little less than 5x5 equivalent at default 1e-8).

Considering that precision is important to our team's work, we gave demeanlist() a shot and ran a model using demeaned values. With demeanlist, we can set eps tolerance to anything -- 1e-8 works (there is no difference between the resulting objects of demeanlist() when tolerance is, for example, set to 1e-1 or 1e-50). However, when feeding the demeaned dataset back into felm(), we still have to set global options lfe.eps = 1e-2 or else we get computation hangs. Note: running felm() on the default/original dataset is producing different estimates than running it on the demeanlist() version, but of course, only very slightly different (mostly beyond 0.0001 values - but indeed different).

Would you have any recommendations as to what's going on or insight on best ways to move forward, namely whether these tolerances really matter if demeanlist() is properly outputting a demeaned dataset? I apologize in advance if this is vague and/or an unhelpful illustration of our problem, the nature of our data is proprietary so it can be tricky to have a reproducible issue. I can provide more information/potentially anonymize the data if anybody thinks it would help to have some actual examples (from the outset, I have no idea what's going on so I couldn't easily re-create an example). Many thanks for your time.

Screen Shot 2019-08-22 at 9 46 10 AM

[left: modeling default data, right: modeling "demeaned" data]