jmbejara / comp-econ-sp19

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
48 stars 26 forks source link

Index Error for Fixed Effects Regression #27

Closed IshaanW closed 5 years ago

IshaanW commented 5 years ago

I'm following the code used in lecture for a fixed effects regression using statsmodels to generate the fixed effects regression for the SeatBelts dataset. My line of code is:

fixedeffectsfit = smf.mixedlm('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age', seatbelts, groups=seatbelts['state']).fit()

However, I'm getting the following error after running this code:


IndexError Traceback (most recent call last)

in () ----> 1 fixedeffectsfit = smf.mixedlm('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + C(state)', seatbelts, groups=seatbelts['state']).fit() ~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in from_formula(cls, formula, data, re_formula, vc_formula, subset, use_sparse, *args, **kwargs) 918 exog_re=exog_re, 919 exog_vc=exog_vc, --> 920 *args, **kwargs) 921 922 # expand re names to account for pairs of RE ~/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py in from_formula(cls, formula, data, subset, drop_cols, *args, **kwargs) 172 'formula': formula, # attach formula for unpckling 173 'design_info': design_info}) --> 174 mod = cls(endog, exog, *args, **kwargs) 175 mod.formula = formula 176 ~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in __init__(self, endog, exog, groups, exog_re, exog_vc, use_sqrt, missing, **kwargs) 687 688 # Split the data by groups --> 689 self.endog_li = self.group_list(self.endog) 690 self.exog_li = self.group_list(self.exog) 691 self.exog_re_li = self.group_list(self.exog_re) ~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in group_list(self, array) 976 if array.ndim == 1: 977 return [np.array(array[self.row_indices[k]]) --> 978 for k in self.group_labels] 979 else: 980 return [np.array(array[self.row_indices[k], :]) ~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in (.0) 976 if array.ndim == 1: 977 return [np.array(array[self.row_indices[k]]) --> 978 for k in self.group_labels] 979 else: 980 return [np.array(array[self.row_indices[k], :]) IndexError: index 556 is out of bounds for axis 1 with size 556 I'm not really sure what this error is referring to since there's nothing with index 556 in the dataset or regression specifications. When I instead use (found on this [site](http://aeturrell.com/2018/02/20/econometrics-in-python-partII-fixed-effects/): `fitFE1 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + C(state)', seatbelts).fit()` I get a working regression. Can I get some help as to what I should be looking to fix for the first line of code and if there is something wrong with the second method to generate fixed effects?
jmbejara commented 5 years ago

Make sure to drop missing data first. Check out the issue on GitHub regarding dropping data. You want to drop as little as possible. Some regression functions won't automatically drop missing data for you.

IshaanW commented 5 years ago

I was having trouble understanding how dropna() did not permanently drop Nan values and was able to fix the issue. Thank you!