lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
361 stars 59 forks source link

Does 2SLS Endogenous Interactions Use Fitted Values? #420

Closed Garcese closed 1 year ago

Garcese commented 1 year ago

Hi! I'll try to keep this short. I have the following regression that I've simplified for the example:

feols(y ~ x1 + i(x1, x2, "reference") | x2 ~ instrument, data = myData)

Basically, I am running a 2SLS, where I am using "instrument" as an instrument for one of my variables, x2, but I also want to interact the fitted values for x2 against another variable, x1. When I run this code, the output calls x2 "fit_x2" like normal, but it only says "x2" for the interaction terms. I'm unsure if the function is actually using the fitted values for x2 from the 1st stage for the interaction terms, and I wish to have this clarified.

Econometrically, I'm 90% sure it is OK to have interactions with an instrumented variable, so I believe I am seeking out sane results. But, if I am mistaken, and it simply does not make sense, please excuse me, and just politely let me know!

Thanks!

kylebutts commented 1 year ago

If you want to interact an endogenous variable with an exogenous variable, you should do so in the IV formula as well (technically you have multiple endogenous variables).

I think it should be something like this:

mtcars$instrument = mtcars$hp + rnorm(nrow(mtcars))
feols(
  mpg ~ i(cyl) | i(cyl, hp) ~ i(cyl, instrument), 
  mtcars
)
Garcese commented 1 year ago

@kylebutts Thank you for the reply! I've played around with the function a bit more myself and came close to something like your answer. I just realized, however, I made a mistake in my original question, as I really have a formula that looks closer to this:

feols(y ~ x1 + x2 + i(x1, x2, "reference") + controls, data = myData)

Basically, I am using this endogenous variable, x2, both as a standalone variable and in the interaction with the exogenous categorical variable x1. So, I'd really like to use a variable z as an instrument for x1, but also use the interaction of z and x1 as an instrument for the interaction of x1 and x2. I can specify the formula in the following way to get something close to this:

feols(y ~ x1 + controls | x2 + i(x1, x2, "reference") ~ z + i(x1, z, "refernce), data = myData

Which performs a first-stage regression for my x2 variable, and for each category of i(x1, x2, "reference"), minus the reference category. This makes sense to me, since, like you said, I technically have multiple endogenous variables, just from the categorical variable alone. However, in each first-stage regression, it seems z is always replacing both the standalone instances of x2 and the x2s in the interaction term. That is, I get first-stage equations that look like this:

x2 = x1 + z + i(x1, z, "reference") + controls

Which I believe is the correct specification, since z is the instrument for x2, so you simply have to replace x2 for z wherever it appears in the RHS formula. But I am not so sure about the formulas for the interaction term. I get a first-stage equation that looks like this, using the first category of x1 as an example:

x2::cateogory1 = x1 + z + i(x1, z, "reference") + controls

Where i(x1, z, "reference") represents z::category1, z::category2, etc., just to be clear. To me, this seems incorrect, since I believe z::category1 should be the correct instrument for x2::category1, and since x2::category1 appears only once in the RHS, that should be the only variable replaced. Instead, z replace x2, and I get replacements like z::category2 which I believe should still be x2::category2.

So, what I really want to ask is, is my reasoning concerning the right way to specify the first-stage regressions correct? I can see two scenarios here, one where my logic is sound, and I have simply incorrectly specified the instruments in the feols() call, and the other where the function is correctly specifying the first-stage regressions, and I am simply making an econometrics blunder. I think it might be latter.

Since this is primarily an econometrics question, please don't feel pressured to answer, however, I figured you or someone else in the fixest community would know and could help me out. Thank you.

kylebutts commented 1 year ago

I understand where the logic is coming from, but all instruments are used for each endogenous variable. So for example, you predict x2 using all 4 instruments. That's just the mechanics of IV.

BTW, you don't need to include x2 if you don't include a reference value in x1.

Garcese commented 1 year ago

@kylebutts Ahh, I see, thank you for the reply. For some reason, I was having a difficult time understanding that. I sort of suspected the function was doing the right thing all along, and I was making a mistake!

Many thanks for your help. I'm a huge fan of the fixest package, it makes working with regressions such a breeze!