ummel / fusionData

Data backend for fusionACS platform
https://ummel.github.io/fusionData/
GNU General Public License v3.0
2 stars 1 forks source link

Fuse realistic expenditure variables #45

Open ummel opened 2 years ago

ummel commented 2 years ago

A related challenge to #44 is how to generate plausible fused values for both energy consumption and expenditure variables, since (presumably) they should be linked in some logical way across space. For example, if two households in the same PUMA have identical fused/simulated total electricity consumption (kWh), we would reasonably expect them to have identical electricity expenditures ($) -- under the assumption that households in the same place face the same pricing tiers.

However, the fusion process itself will not produce this pattern, because the fusion process will introduce a random component and (more fundamentally) the RECS sample is far too small to provide clear, location-specific relationships between consumption and expenditure.

In general, we expect physical consumption to be better modeled than expenditures, for the simple reason that energy prices and pricing tiers vary considerably across space and are not well-represented by our spatial predictors. Consequently, it makes sense to fuse consumption variables first (per one of the strategies in #44) and then, separately, derive expenditures.

My suggested approach: After identifying a preferred approach in #44 for fusing consumption (among other variables), employ an additional fusion model for just the RECS total expenditure variables (total electricity spending, etc.) using all other variables as predictors. Then, for each PUMA, fit a monotonic GAM model -- using the scam package -- with consumption for the x-variable and the fused expenditure values as the y. Each model defines a generic relationship between consumption and expenditure specific to the PUMA. Replace the fused expenditure values with the values predicted by the local model. That is, we force a single relationship between consumption and expenditure for all households in the same PUMA.

Is this realistic? Not necessarily -- but it enforces a minimum standard of plausibility. Namely, that households in the same location face identical pricing schemes for electricity and natural gas (not sure about the other energy goods captured in RECS).