larsvilhuber / MobZ

https://larsvilhuber.github.io/MobZ/
3 stars 0 forks source link

Compare results to results on server #32

Closed andrewfoote closed 4 years ago

andrewfoote commented 4 years ago

Need to figure out exactly how to match summary stats.

https://github.com/larsvilhuber/MobZ/blob/ef3fb89fc88ee055ca607ba3f9d9d941465588ec/programs/06_qcew/01_regressions_table.log#L754-L756

andrewfoote commented 4 years ago

@larsvilhuber I believe that I figured out the source of the discrepancy. In the log, it indicates that we have 1990 to 2019. However, the regressions that we run in the paper are 1990 to 2016. I am going to make an edit to the do-file to make that restriction, and then you can re-run.

andrewfoote commented 4 years ago

@larsvilhuber #14 The SDs are the same now, but the means and p50 are different. Not sure where the discrepancy is. It isn't large.

andrewfoote commented 4 years ago

@larsvilhuber

Looking at log files here, and comparing to summary stats on server. Some notes:

~~- QCEW_earnings.dta has very different values for avg_annual pay (28054 vs 9.9million) https://github.com/larsvilhuber/MobZ/blob/lars/programs/06_qcew/00_qcew_post_extraction.log#L279 (However, it is unclear when avg_annual_pay is used - this may be irrelevant)~~ QCEW_earnings is irrelevant, and we never use the file - we should probably remove that from the code.

andrewfoote commented 4 years ago

@larsvilhuber

Any chance you could upload the final dataset to github (maybe before CZ aggregation) so that I could compare values? I just don't understand why we are getting such different numbers since the sum stats are so similar.

larsvilhuber commented 4 years ago

@andrewfoote : OK,

larsvilhuber commented 4 years ago

The two files (before and after computation/reduction to yearly) are at https://www.dropbox.com/sh/d9ba3fyzl2g0ksk/AADBhuwDZxh4EIcb57dqk1lra?dl=0

andrewfoote commented 4 years ago

@larsvilhuber So I believe the difference between the two QCEW_county files is that in my file, I just kept quarter 1, while you averaged employment over the quarters.

Your way is probably better, but that is what is driving the discrepancy. We could test that directly if we wanted, by re-doing the extraction just keeping Q1.

https://github.com/larsvilhuber/MobZ/blob/lars/programs/06_qcew/00_qcew_extraction.sas#L9

andrewfoote commented 4 years ago

Re-ran things on my end, averaging emp_month1 over the year, and got very similar results to my original ones, which suggests the discrepancy is in uireceipt.

However, the SD Bartik was more different using that approach.

My suggestion is...maybe to just go with your results and note that in replication, we had to change them. It doesn't change the takeaway of the paper.

larsvilhuber commented 4 years ago

@andrewfoote Your pick. I agree that if the takeaway is the same, even when we slightly change the regression. I can go either way.

andrewfoote commented 4 years ago

@larsvilhuber I vote just going with the results on ECCO - I have more confidence in them, and they are replicable. I prefer that over trying to figure out exactly why our coefficients are slightly different, since MB is low.

larsvilhuber commented 4 years ago

@andrewfoote Sounds good. Let me push them into Overleaf, what we've got, and you can take care of the writing, while I tie down the other loose ends. Feel free to wait until I'm done with all of it.

andrewfoote commented 4 years ago

Adding my thoughts here to test them tomorrow:

Need to adjust the regression loop programs to put these stats in there so that we can see if something funny is happening on some weird boundary case.

andrewfoote commented 4 years ago

Finished!!! Closing this ticket after @larsvilhuber checks in final version