Error when using an indicator factor variable

NilsEnevoldsen commented 7 years ago

. version
version 14.2

. which reghdfe
/Users/nilsschool/Library/Application Support/Stata/ado/plus/r/reghdfe.ado
*! version 4.1.0 28feb2017

. 
. sysuse auto, clear
(1978 Automobile Data)

. 
. reg price weight i.rep78 1.foreign //Model

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(6, 62)        =     10.38
       Model |   288985161         6  48164193.5   Prob > F        =    0.0000
    Residual |   287811798        62  4642125.77   R-squared       =    0.5010
-------------+----------------------------------   Adj R-squared   =    0.4527
       Total |   576796959        68  8482308.22   Root MSE        =    2154.6

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   3.392021    .436278     7.77   0.000     2.519914    4.264128
             |
       rep78 |
          2  |   542.3997   1706.922     0.32   0.752     -2869.69    3954.489
          3  |   836.6638   1580.637     0.53   0.598    -2322.985    3996.312
          4  |    521.878   1650.986     0.32   0.753    -2778.396    3822.152
          5  |   1096.374   1759.422     0.62   0.535    -2420.661    4613.409
             |
     foreign |
    Foreign  |   3530.574   852.8057     4.14   0.000     1825.839    5235.308
       _cons |  -5950.765   2037.208    -2.92   0.005    -10023.09   -1878.444
------------------------------------------------------------------------------

. reghdfe price weight, absorb(i.rep78 1.foreign) //Error
assert failed: ("i"=="c")
                 stata():  3598  Stata returned error
         fixed_effects():     -  function returned error
                 <istmt>:     -  function returned error
r(3598);

sergiocorreia commented 7 years ago

I think the warning is correct in this case.

You can do reghdfe 1.foreign ... or reghdfe price 1.foreign ... , but adding that in absorb would just pick one factor of all the possible factors in that variable, which is not what absorb() does.

NilsEnevoldsen commented 7 years ago

The behavior I expect would be reghdfe price, absorb(3.rep78) absorbs an indicator for rep78 == 3. I expect this because it seems symmetrical with the behavior of reg price 3.rep78. Do you disagree?

sergiocorreia commented 7 years ago

The problem is that internally 3.rep78 works more like a normal regressor than as a factor variable.

Medium term I thought about adding a partial() variable that would encompass this case. Something like what I'm doing on an ivreg2 demo:

ivreg2hdfe price weight 3.rep, absorb(turn) partial(3.rep)

(On the downside, ivreg2 requires partialled-out regressors to be both on the RHS and on partial())

NilsEnevoldsen commented 7 years ago

That gets tricky if I want to do, say, ivreg2hdfe price weight 3.rep, absorb(turn#3.rep).

sergiocorreia commented 7 years ago

Agreed. Was playing with a similar example and saw that it only accepts i. and c. prefixes. One of the reasons for being so strict is to prevent bugs in absorb() , since writing parsing commands is a real pain in the ass (and in this case, required 266 lines of code just for testing

I'll think about what can be changed to make it more flexible without risking introducing bugs.

sergiocorreia commented 7 years ago

So the code for the regressors has been streamlined and should be mostly free of issues, but the code for absorb() is still tricky: there is no way it can be treated as a normal varlist because it's not a normal varlist:

Always requires a factor variable
By default absorb(turn) is like absorb(i.turn), which is not the case for general varlists (where it is equivalent to c.turn)
i.turn##c.gear should not expand the ## because that means something different (compute two different sets of FEs instead of a joint one with intercept and slope)
Similarly for i.turn##c.(gear weight) (don't expand what's inside the parens)
There are also target variables, so we can do `absorb(FE1=turn FE2=trunk)

Thus, we have a lot of special cases and can't rely on syntax/fvunab/fvexpand/etc.

Since it looks like a huge hassle for little gain, I'll mark it as wontfix for now and perhaps at some point me or someone will go through the work of improving it

sergiocorreia / reghdfe

Error when using an indicator factor variable #85