Typos in HW 4 - Githubissues

jmbejara commented 6 years ago

Here is a list of known typos and other improvements to make in HW 4. I haven't corrected these particular ones yet. If anyone notices any other typos, please let me know here.

Histograph should be histogram
Be clear about use the ASEC samples and not the regular monthly samples (from March)
Be more clear about where we should be dropping missing values. Some people have had issues matching my numbers and the issue seems to be affected by when we choose to drop missing variables in our code (before or after dropping certain columns).

jmbejara commented 6 years ago

Q10. I am using a strict inequality to compare real wages to quantiles. The problem description is not clear about this.

jmbejara commented 6 years ago

In the HW, I have code in there that drops UHRSWORKLY, even though we are told to use it later. We should not drop UHRSWORKLY

jmbejara commented 6 years ago

Q8. (Plot a histogram of the average weekly hours and the average annual hours worked. ) This should ask for a histogram of UHRSWORKLY and for a histogram of annual_hours

jmbejara commented 6 years ago

This is how I am subsetting based on GQ. I don't include GQ = 0 because the codebook says something about 0 being NIU. My comment in the HW should reflect this.

# GQ = 0 for vacant units, 1 for Households, 2 for group quarters
df = df[df.GQ == 1]

afgong commented 6 years ago

In Q15, do you mean compute three correlations? So the correlation between ave_wages and median_wages, ave_wages and employment, and median_wages and employment?

jmbejara commented 6 years ago

There will be a matrix of correlations. The entries of the matrix will have the correlations between each combination of pairs. There is a single command for this.

afgong commented 6 years ago

In Q20, are we looking to space the bins like [25, 30, 35, 40, 45, 50, 55]? Also, is educ_bins supposed to correspond to the codebook values? So educ_bins should be a list of 5 elements?

jmbejara commented 6 years ago

Age bins look right to me.
The numerical order of the education codes seems to match the ascending levels of education. For that reason, I use the binning function to group the education codes into my custom defined education groups. There are 5 education groups, so there should be 6 numbers in the list.

afgong commented 6 years ago

screen shot 2018-05-02 at 12 27 27

Is the graph from Q16 supposed to look something like this?

afgong commented 6 years ago

Also, for Q22, how do you remove the average_wage above the Bachelors_Degree, so that in the heatmap in the following question, it doesn't look like this:

screen shot 2018-05-02 at 13 09 07

screen shot 2018-05-02 at 13 11 37

jmbejara commented 6 years ago

This is what mine looks like:

q16

This might help. Here I have run df.describe() at various points.

At Q7:

Before Q11: beforeq11

jmbejara commented 6 years ago

With respect to multiindexing, you can do this:

q21_1

q21_2

jmbejara commented 6 years ago

To change the order of the columns, I am doing it manually like this:

afgong commented 6 years ago

Yea, our summary statistics are diverging very mildly at Q7...will go back and double check what's going on. This is what I have right now from Q4 and Q5, respectively:

screen shot 2018-05-02 at 13 28 13

screen shot 2018-05-02 at 13 28 28

jmbejara commented 6 years ago

Everything here looks good to me. hmm. I don't know. What are you getting for df.describe() at the point at which it diverges?

afgong commented 6 years ago

Q7:

screen shot 2018-05-02 at 13 56 17

jmbejara commented 6 years ago

I think you need to rerun your code from the beginning. My Q7 real_wage max is much larger. It looks like you have already dropped the observations described in Q10 at this point.

Jacob-Bishop commented 6 years ago

What is the employment variable supposed to measure? My assumption was LABFORCE, but we dropped that variable earlier on in the code.

jmbejara commented 6 years ago

employment was created from the variable in_labor_force. This is because LABFORCE was a variable equal to 0,1, or 2. The variable we created was True or False (1 or 0).

Jacob-Bishop commented 6 years ago

Okay. Do you want it to be the average (fraction employed) or the sum (total employed?).

afgong commented 6 years ago

@jmbejara hmm, I don't know what's going on. At which question(s) did you drop the missing values? I dropped (df.dropna(axis=0, how='any')) at the end of my code at Q4 OR at the beginning of my code at Q5.

jmbejara commented 6 years ago

@Jacob-Bishop I was looking for the fraction. Also, be sure to take a weighted average.

jmbejara commented 6 years ago

@afgong Sorry for this confusion. I have updated my code so that it only drops missing values at the specific points where I say to drop them in the problem descriptions. This changes my Q7 describe to the following: q7_df_describe

At this point, I only drop rows at the end of Q6, calling df = df.dropna()

Sorry about this. If your answers look reasonably close, I wouldn't worry to much about this. I've instructed Philip to be generous with the grading in this regard. (Also, it's been interesting to me how little things like this can make replication so challenging.)

afgong commented 6 years ago

Thank you so much!!!

jmbejara / comp-econ-sp18

Typos in HW 4 #39