jmbejara / comp-econ-sp19

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
48 stars 26 forks source link

Hw 4 #24, 11 #35

Closed erineidschun closed 5 years ago

erineidschun commented 5 years ago

My code for #24 does not seem to be working:

shift = -1
inner_means = (df
               .groupby(by=['YEAR', 'age_binned', 'educ_binned'])
               .apply(lambda x: np.average(x['real_wage'], weights=x.ASECWT))
              )

pd.DataFrame(inner_means)

#Create Bin Weight Sums
weights_2000 = (df[df.YEAR == 2000]
                .dropna()
                .groupby(by=['YEAR','age_binned', 'educ_binned'])
                .ASECWT
                .sum())

adj_series = (inner_means
              .groupby(level='YEAR')
              .apply(lambda x: np.average(x, weights=weights_2000)))
# Lag, since the we use "last years weeks worked", etc.
adj_series = adj_series.tshift(shift)
tdf['adj_ave_wages'] = adj_series

I suspect it has to do with the bolded part.

.groupby(by=['YEAR','age_binned', 'educ_binned'])

In that line, I've also tried including only 'age_binned' and 'educ_binned', and when doing that, or doing the code shown above, I get an error: Axis must be specified when shapes of a and weights differ. This points to the .apply(lambda x: np.average... line.

Additionally, what exactly are we supposed to compute for the employment variable in #11? I used in_labor_force and used np.average for this, but not sure if this is what you wanted.

jmbejara commented 5 years ago

Since you select out the year 2000 (df.YEAR == 2000), you don't then need to group by year. You can, and that's why it still runs ok at that point. The problem arises when you try to use weights_2000 as the weights. When you group by year, it puts YEAR in as a level in the MultiIndex. Just don't group by year, and that should help!

henryli78 commented 5 years ago

Actually I had the same bug, but I think it is because you should look at the "YEAR" variable more closely. "YEAR" is actually a date... not a 4 digit year which tripped me up. I think we should set df.YEAR == '2000-01-01' instead of 2000.

Hope this makes sense and fixes errors