lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
361 stars 59 forks source link

String reference category in i() within functions #426

Closed kennchua closed 5 months ago

kennchua commented 1 year ago

Hello, I was hoping to ask for clarity on the error message I'm receiving when passing a string argument to ref in i() within a function.

When not using the i() within a function, the following works fine:

feols(.["Sepal.Length"] ~ i(.["Species"], ref = "setosa"),
      data = iris)

But when using the i() within a function, I get the error: The variable 'refgrp' is in the RHS of the formula but not in the data set.

est_reg <- function(df, yvar, xvar, refgrp) {

  reg <- feols(.[yvar] ~ i(.[xvar], ref = refgrp),
               data = df)

  return(reg)
}

est_reg(iris, "Sepal.Length", "Species", ref = "setosa")

Is ref looking for a variable in the dataset as opposed to a value of the variable when used within a function but not when used directly?

Thanks in advance for your help!

(P.S. FWIW, there is no issue when I am passing a reference category that is numeric.)

est_reg <- function(df, yvar, xvar, refgrp) {

  reg <- feols(.[yvar] ~ i(.[xvar], ref = refgrp),
               data = df)

  return(reg)
}

est_reg(mtcars, "mpg", "am", ref = 0)
kylebutts commented 1 year ago

The reason this is a hard challenge is that the formula object is evaluated within the df environment, so it looks for refgrp within df and doesn't find it. Of course you have .[] notation which lets you look for variables in df using variables from the calling environment (yvar and xvar).

However, this ref problem is a bit different. You want to insert a character "setosa" into a formula which is done here: https://github.com/lrberge/fixest/blob/6b852fa277b947cea0bad8630986225ddb2d6f1b/R/fixest_env.R#L559-L594


@lrberge, I believe the bug is here. Specifying mode = "numeric" only looks for numbers and not characters. Replacing it as I did fixes this problem (though could cause bugs if var in the call_env is a function, for example).

# if(exists(var, envir = call_env, mode = "numeric")){
if(exists(var, envir = call_env)){

Here is a minimal reprex in case you think of an easy fix Laurent:

# works
refgrp = 6
fixest::feols(
  mpg ~ i(cyl, ref = 6),
  data = mtcars
)
# Does not work
refgrp = "setosa"
fixest::feols(
  Sepal.Length ~ i(Species, ref = refgrp),
  data = iris
)

@kennchua Here's two workarounds:

  1. Use setFixest_fml and specify the ref variable by hand. You can change ..speciesDummies multiple times in the script and it will "swap" our the correct formula.
setFixest_fml(..speciesDummies = ~ i(Species, ref = "setosa"))
est_reg <- function(df, yvar) {
  reg <- feols(
    .[yvar] ~ ..speciesDummies,
    data = df
  )

  return(reg)
}
est_reg(iris, "Sepal.Length")
  1. You could create a string and then convert to a formula
est_reg <- function(df, yvar, xvar, refgrp) {
  fml = as.formula(paste0(
    ".[yvar] ~ i(.[xvar], ref = '", refgrp, "')"
  ))
  reg <- feols(
    fml,
    data = df
  )

  return(reg)
}
est_reg(iris, "Sepal.Length", "Species", "setosa")
kennchua commented 12 months ago

Hi @kylebutts, thanks for carefully explaining what is going on under the hood and providing workarounds. Both solutions look good. Appreciate your help!

kylebutts commented 12 months ago

@kennchua, happy to help! Could you reopen this issue so that Laurent can see it?

lrberge commented 5 months ago

Thanks @kennchua for reporting, your use case was totally valid. Thanks @kylebutts for finding the problem. I agree with you it was too restrictive. I also took the advantage to fix a bug o these lines!

Thanks all!