Closed andrewfoote closed 4 years ago
@larsvilhuber I believe that I figured out the source of the discrepancy. In the log, it indicates that we have 1990 to 2019. However, the regressions that we run in the paper are 1990 to 2016. I am going to make an edit to the do-file to make that restriction, and then you can re-run.
@larsvilhuber #14 The SDs are the same now, but the means and p50 are different. Not sure where the discrepancy is. It isn't large.
@larsvilhuber
Looking at log files here, and comparing to summary stats on server. Some notes:
UI receipt has slightly less observations in my data (a few 100)
QCEW_county.dta has different values for avg_annual_employment (2089 vs 2100) https://github.com/larsvilhuber/MobZ/blob/lars/programs/06_qcew/00_qcew_post_extraction.log#L199
~~- QCEW_earnings.dta has very different values for avg_annual pay (28054 vs 9.9million) https://github.com/larsvilhuber/MobZ/blob/lars/programs/06_qcew/00_qcew_post_extraction.log#L279 (However, it is unclear when avg_annual_pay is used - this may be irrelevant)~~ QCEW_earnings is irrelevant, and we never use the file - we should probably remove that from the code.
@larsvilhuber
Any chance you could upload the final dataset to github (maybe before CZ aggregation) so that I could compare values? I just don't understand why we are getting such different numbers since the sum stats are so similar.
@andrewfoote : OK,
The two files (before and after computation/reduction to yearly) are at https://www.dropbox.com/sh/d9ba3fyzl2g0ksk/AADBhuwDZxh4EIcb57dqk1lra?dl=0
@larsvilhuber So I believe the difference between the two QCEW_county files is that in my file, I just kept quarter 1, while you averaged employment over the quarters.
Your way is probably better, but that is what is driving the discrepancy. We could test that directly if we wanted, by re-doing the extraction just keeping Q1.
https://github.com/larsvilhuber/MobZ/blob/lars/programs/06_qcew/00_qcew_extraction.sas#L9
Re-ran things on my end, averaging emp_month1
over the year, and got very similar results to my original ones, which suggests the discrepancy is in uireceipt.
However, the SD Bartik was more different using that approach.
My suggestion is...maybe to just go with your results and note that in replication, we had to change them. It doesn't change the takeaway of the paper.
@andrewfoote Your pick. I agree that if the takeaway is the same, even when we slightly change the regression. I can go either way.
@larsvilhuber I vote just going with the results on ECCO - I have more confidence in them, and they are replicable. I prefer that over trying to figure out exactly why our coefficients are slightly different, since MB is low.
@andrewfoote Sounds good. Let me push them into Overleaf, what we've got, and you can take care of the writing, while I tie down the other loose ends. Feel free to wait until I'm done with all of it.
Adding my thoughts here to test them tomorrow:
Need to adjust the regression loop programs to put these stats in there so that we can see if something funny is happening on some weird boundary case.
Finished!!! Closing this ticket after @larsvilhuber checks in final version
Need to figure out exactly how to match summary stats.
https://github.com/larsvilhuber/MobZ/blob/ef3fb89fc88ee055ca607ba3f9d9d941465588ec/programs/06_qcew/01_regressions_table.log#L754-L756