jmbejara / comp-econ-sp19

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
48 stars 26 forks source link

HW3: Two Fixed Effects #26

Closed abbywh closed 5 years ago

abbywh commented 5 years ago

I've found documentation in the 4/22 notes which showed how to add fixed effects in a regression. However, there seems to be little documentation (in or out of class notes) on how to account for two fixed effects at once, like problem 3 requires. Could someone point me towards resources/the correct way to group the data?

richard-archer commented 5 years ago

I suppose this is a similar question - is adding fixed effects for a variable (functionally) just adding a bunch of indicators for that variable (i.e., is year fixed effects just having a coefficient on whether or not the year is 1987, whether or not the year is 1988,etc)? If that's the case, then

reg = smf.ols('fatalityrate ~ sb_useage+state+str_year+speed65+speed70+ba08+drinkage21+log_income+age', seatbelts).fit()

works fine (where str_year was generated by seatbelts['str_year'] = seatbelts['year'].apply(str))

But if I've misinterpreted what fixed effects are, then that might not be what we want

jmbejara commented 5 years ago

Hi @jwhitty32 . Are you referring to https://github.com/jmbejara/comp-econ-sp19/blob/master/lectures/4-23_Panel_Data/Fixed-and-Random-Effects-Rosetta-Stone.ipynb

If not, check that out. It should help.

jmbejara commented 5 years ago

Hi @richard-archer . The fixed effect is like adding a dummy indicator variable for each year. What you've described isn't quite that. You would need a binary indicator for each year (leaving one out to avoid collinearity). However, note that adding an indicator for each variable forces Python to run the regression in a way that might not be computationally efficient. Using the proper fixed effects method will perform the calculation efficiently.

richard-archer commented 5 years ago

I see. To perform that regression in a computationally efficient way, we should use something of the form:

reg = linearmodels.PanelOLS.from_formula("y~ x1+ x2+ EntityEffects + TimeEffects", data=df)

? That's the example that was provided, but I don't understand how python would identify (or rather, how we should designate) what should be treated as the entity effects and what should be treated as the time effect

jmbejara commented 5 years ago

Right. It uses the variables in the DataFrame index. When there are two, you need a pandas multi-index.

jmbejara commented 5 years ago

I can't remember the ordering off the top of my head. I have poor connectivity right now to look it up. The first level of the multi-index might be the time effects and the other the entity effects. The names don't matter so much, as long as you treat them consistently.

erineidschun commented 5 years ago

Are state fixed effects the same as entity effects? When you say add "firm fixed effects" or "state fixed effects" how does that change the linearmodels.PanelOLS.from_formula inputs? The panel data lecture is quite confusing to me.

For example, Question 2 asks to consider state fixed effects. Below, seatb is the name of the data:

seatb2 = seatb[['fatalityrate' , 'sb_useage', 'speed65', 'speed70', 'ba08', 'drinkage21','log_income', 'age']].dropna()

reg = linearmodels.PanelOLS.from_formula("fatalityrate ~ sb_useage+ speed65 + speed70+ ba08+ drinkage21+log_income+age + EntityEffects", data=seatb2)

This produces an error in the second line: Error evaluating factor: NameError: name 'fatalityrate' is not defined 0 + fatalityrate ^^^^^^^^^^^^