ummel / fusionData

Data backend for fusionACS platform
https://ummel.github.io/fusionData/
GNU General Public License v3.0
2 stars 1 forks source link

Compare county-level consumption estimates to validation data #42

Open ummel opened 2 years ago

ummel commented 2 years ago

Do an initial 2015 RECS-ACS fusion with limited set of fusion variables -- similar to step in #47 but for electricity and natural gas consumption rather than expenditure.

Aggregate fusion output to county level for states with disclosed utility data. Keep in mind that the geo-processed/concordance/geo_concordance.fst file contains complete geographic nesting information. You can use that file to compute county values as weighted average of underlying PUMAs, where the weight is the number of households ("hus10" variable) in the spatial intersection of each county and PUMA. This will also quickly tell you which counties are "splitting up" PUMAs to arrive at an estimate.

Compare county estimates with the validation data. Don't worry about absolute "error" initially -- focus on the correlation across counties. Is the general spatial pattern replicated by the fusion estimates? Do discrepancies discovered in #41 help explain which counties are "off"? Think critically about the validation data here. Don't assume the issue is necessarily on the fusion side. Recall that the pattern of expenditures across PUMA's (#40) shows pretty decent agreement between fused data and plausible ACS-derived estimates. That exercise isn't definitive, but it should give us some default confidence in the fusion output.

Technically, the appropriate way to compute county estimates is to use the microACS package. But that's like using a sledgehammer to drive a finishing nail.