jmbejara / comp-econ-sp18

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2018)
16 stars 23 forks source link

Fixed effects using OLS vs linearmodels #75

Closed ethanmetzger closed 6 years ago

ethanmetzger commented 6 years ago

Here I try to include state and year fixed effects using OLS:

df['year'] = df['year'].astype('category')
reg3 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + state + year + EntityEffects + TimeEffects', df).fit()
reg3.summary()

Here I try to do the same thing but using the linearmodels method:

df2 = df.dropna().set_index(['state', 'year'])
md = linearmodels.RandomEffects.from_formula("fatalityrate ~ 1 + sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income)", data=df2)
mdf = md.fit() 
mdf

Why do I get different results? Why is one of them (the OLS way, I'm guessing) wrong? I'm especially confused because in R I got the same results whether I used OLS or plm.

ethanmetzger commented 6 years ago

I'm also getting slightly different results for these two regressions:

reg2 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + state', df.dropna()).fit()
reg2.summary()

and...

md = smf.mixedlm("fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age", data=df.dropna(), groups=df["state"]) 
mdf = md.fit() 
mdf.summary()
jmbejara commented 6 years ago

I'm not sure why this works:

df['year'] = df['year'].astype('category')
reg3 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + state + year + EntityEffects + TimeEffects', df).fit()
reg3.summary()

It seems like this would fail. The ols method doesn't know what to do with the EntityEffects and TimeEffects arguments in the formula. That seems confusing to me. To compute the fixed effects regression, you need to use the linearmodels.PanelOLS function. Also note that linearmodels.RandomEffects and linearmodels.PanelOLS are not the same thing. A fixed effects models is not the same thing as a random effects model (though they are closely related).

jmbejara commented 6 years ago

For your second question, including state in the OLS regression doesn't seem like should make sense. It's a categorical variable, right?

ethanmetzger commented 6 years ago

Re: "It seems like this would fail," I mistakenly included Entity and Time Effects in that line of code. I meant to write:

df['year'] = df['year'].astype('category')
reg3 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + state + year', df).fit()
reg3.summary()
ethanmetzger commented 6 years ago

I've since tried:

df2 = df.set_index(['state', 'year'])
reg = linearmodels.PanelOLS.from_formula("fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + EntityEffects + TimeEffects", data=df2)
reg.fit()

I am getting something closer to what I am getting through

df['year'] = df['year'].astype('category')
reg3 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + state + year', df).fit()
reg3.summary()

but still slightly different (not sure if you're okay with discussing precise values on this forum).

Re: "For your second question, including state in the OLS regression doesn't seem like should make sense. It's a categorical variable, right?" I was under the impression that by making state and year categorical variables and running an OLS regression I'm essentially doing the same thing that I'd be doing if I were to use something like linear models.PanelOLS. Is this incorrect?

jmbejara commented 6 years ago

If you're turning year and state into individual dummy variables, then it's the same thing. It doesn't look like that's what you're doing. So, in that case, it would be different.