lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
361 stars 59 forks source link

Using variable names with special symbols in backticks? #441

Open skranz opened 10 months ago

skranz commented 10 months ago

First of all thanks for your great package! I wonder if there is any possibility to use variable names with special symbols when the variable is put in backticks in the formula. The following examples work nicely with lm:

dat = tibble(`L@y`=rnorm(10), `L°y`=rnorm(10),x=rnorm(10) )
lm(`L@y`~x, data=dat)
lm(`L°y`~x, data=dat)

But they both throw errors with fixest (version 0.11.1):

feols(`L@y`~x, data=dat)

Error in feols(`L@y` ~ x, data = dat) : 
  Evaluation of the left-hand-side (equal to L@y) raises an error: 

  trying to get slot "y" from an object of a basic class ("function") with no
slots

feols(`L°y`~x, data=dat)

Error in feols(`L°y` ~ x, data = dat) : 
  Evaluation of the left-hand-side (equal to L°y) raises an error: 

  In str2lang(lhs_text2eval): <text>:1:2: unexpected input
1: L°

Is there a way to use such non-standard variable names also in fixest? I was thinking of perhaps using feols.fit instead of feols but I am not sure whether I can specify instrumental variables with feols.fit.

Background: You may wonder why I use such weird variable names. I am exploring automatic translation from regressions run in Stata to R. (Possibly in the future also between different R commands). For this it seems best to develop some canonical format for naming regression variables, like L@y (shall specify to lag the variable y) or x1=a#x2=b (the dummy of an interaction effect between factor variable x1 taking the level a and factor variable x2 taking the level b) and so on. In particular, to ensure that the same factor levels will be dropped in R and Stata it seems easier if I generate the corresponding terms manually instead of trying to relate the interaction term formulas etc. to the feols formula. Now, it would be great to simply add those non-standard variables in the feols formula. Of course, I could use workarounds like replacing the special symbols before I call feols, e.g. replace @ by .._. and hope that will keep names unique. But if there would be a way to directly use those variable names, it would be less error prone.

etiennebacher commented 10 months ago

FYI I don't have any error with your code:

library(fixest)

dat = tibble::tibble(`L@y`=rnorm(10), `L°y`=rnorm(10),x=rnorm(10) )
lm(`L@y`~x, data=dat)
#> 
#> Call:
#> lm(formula = `L@y` ~ x, data = dat)
#> 
#> Coefficients:
#> (Intercept)            x  
#>     -0.1536       0.3295
lm(`L°y`~x, data=dat)
#> 
#> Call:
#> lm(formula = `L°y` ~ x, data = dat)
#> 
#> Coefficients:
#> (Intercept)            x  
#>      0.5439      -0.1922
skranz commented 10 months ago

The lm commands also work nicely without error on my computer. The errors are thrown when using feols from fixest instead of lm.

etiennebacher commented 10 months ago

Oh right, my bad

lrberge commented 9 months ago

Hi Sebastian! Thanks for spelling out why non-standard variable names could be useful. And I completely agree with you in principle.

Unfortunately this is currently not possible in the package. I use a lot of non standard parsing of the formula which, to work, requires variables to be conventionally named. Simply allowing non-standard names would introduce bugs (depending on the name).

That said, I could change the technology I use to parse the formula. But the price for this change is pretty substantial and other features currently have higher priority. I'm not saying it will never be allowed but the time horizon for the implementation is rather long run.

What I will do, in the short run, is to add a proper error message which is long overdue! (Please keep this issue open until I fix this :-))

skranz commented 9 months ago

Hi Laurent, thanks for your reply. Yes, I can imagine that it may not be easy to adapt the parser in a way that generally allows back-ticked non-standard variables without non-intentionally introducing some errors on the way. And as I wrote there are workarounds for me by renaming the variables before and after using fixest.

I am not sure whether it is necessary to change the error message, though. In particular, it may not be helpful to generally check for non-standard names and always throw an error.

Currently, many none standard variables work without problems. E.g. when reversing the dependent and explanatory variables in the example above, the code runs without problems:

library(fixest)
dat = tibble(`L@y`=rnorm(10), `L°y`=rnorm(10),x=rnorm(10) )
lm(x~`L@y`, data=dat)
lm(x~`L°y`, data=dat)

From my point of view it would be nice if the examples of non-standard variables that currently work would still be allowed in fixest and not be precluded by a general error message. (It is nicer to only need a work-around for a single dependent variable than for all variables. Also tighter checking might destroy backward compatibility of some existing code.). From my point of view, the current error messages are also already informative enough. It seemed always pretty clear from the error messages that the non-standard names of the dependent variable are the problem.