larsvilhuber / MobZ

https://larsvilhuber.github.io/MobZ/
3 stars 0 forks source link

Point 4: MOE questions #6

Closed andrewfoote closed 4 years ago

andrewfoote commented 4 years ago

4) The authors use the published margins of error from the 2009-2013 ACS Journey to Work data to demonstrate the sensitivity of clustering results to the errors in the underlying flow data. Given the MOE data are developed about two decades after the 1990 Journey to Work data, you might want to provide justification of the data choice. Specifically, are there changes in data collection since 1990 which might impact MOE? How does this affect your estimates? Related, how robust are the findings over time?

larsvilhuber commented 4 years ago

This might involve doing a bit of code work as well.

andrewfoote commented 4 years ago
  1. Given the MOE data are developed about two decades after the 1990 Journey to Work data, you might want to provide justification of the data choice. Specifically, are there changes in data collection since 1990 which might impact MOE? How does this affect your estimates?

Proposed referee response:

As we outline in footnote 10, the 1990 Census Long Form was a sample of 1 in 6 households, while the pooled 5 year ACS estimates are a sample of 1 in 10, which means that the MOEs reported in the ACS tables are an upper bound of the error, which overstates the uncertainty. However there is no clear way to deflate the MOEs to reflect the additional sampling in 1990, which means that our results are a worst-case for the underlying variability.

andrewfoote commented 4 years ago

Related, how robust are the findings over time?

Not sure how to address this within the scope of the paper - perhaps cite Fowler? (who does CZ work over time)

larsvilhuber commented 4 years ago

Can we re-run this with later Journey to work data (ACS-based, contemporaneous)? Show a similar graph? That would be additional work, so maybe a stretch-goal.

andrewfoote commented 4 years ago

@larsvilhuber We can definitely do that, the code is pretty flexible; turns out I had already run it on 2009 as well, presumably because why not.

larsvilhuber commented 4 years ago

@andrewfoote Let's do it on ECCO, but let's do it AFTER I get the code to run on ECCO...

andrewfoote commented 4 years ago

Some interesting results: Running the SAME procedure on JTW 1990 and 2009 (of course, with JTW 1990 MOEs imputed), we get vastly different results. Two graphs attached to this comment, which shows the issues with MOEs are huge when using true MOEs.

Not sure what to do about this, but putting it out there. It seems like our procedure understates the MOEs for 1990 JTW data

mismatch_jtw1990 mismatch_jtw2009

andrewfoote commented 4 years ago

Update: I figured out what was wrong in the code, which illustrates a larger problem - I shouldn't have been swapping between STATA and SAS (we started this project when my coding abilities were much worse in SAS).

We had been taking the draw from a truncated normal (where it can't go below zero, but with a positive mean) and accidentally labelling that the MOE.

I have fixed that problem, and am re-running the bootstrap code. Also cursing at myself loudly. I am really sorry - I think in an old version of the code, we didn't keep the draw.

andrewfoote commented 4 years ago

Update with results: 1990 JTW re-sampling looks much different when code is fixed. share_mismatch attached here for reference. mismatch_jtw1990

andrewfoote commented 4 years ago

Re-did replication graphs for ADH, which look much more logical (another note - our Table 3 Colum 5 is incorrect, and needs to be updated) 1990_distribution 1990_tstat_distribution