ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

Add socioeconomic variables for next project #70

Closed ld-archer closed 2 years ago

ld-archer commented 3 years ago

Next project is looking at links between socioeconomic status and health, so will need to add back variables related to wealth and income.

Vars to add:

Down the line might possibly consider disaggregating the combined variables but this will be a good start.

ld-archer commented 3 years ago

Wealth and income both back in, most of the work was done as they were still in the input data, just had to re-add them to the model. Inflation multiplier will need to be included, but I think I can get away with only including it in reshape_long. Can do the calculations to account for inflation in that script then all input populations will inherit.

ld-archer commented 3 years ago

This has definitely not worked as we want to, see table below for Cross-validation T-tests for income and wealth.

variable fem_mean_wave1 elsa_mean_wave1 p_value_wave1 fem_mean_wave2 elsa_mean_wave2 p_value_wave2 fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8 fem_mean_wave9 elsa_mean_wave9 p_value_wave9
Total Family Wealth (thou.) 248.5051 205.6091 0 186.0797 243.5422 0 193.403 276.8282 0 206.8602 280.1831 0 216.302 293.1316 0 225.9 314.7851 0 232.6775 347.3489 0 238.6429 397.9592 0 240.8246 441.0225 0
Total Family Income (thou.) 19.73524 18.9182 0.00137 21.48945 19.49766 0 21.78878 20.9715 0.01026 22.14617 21.40408 0.01726 22.23113 21.83524 0.21191 22.31243 23.81972 1E-05 22.32754 25.30901 0 22.30042 26.44413 0 22.22436 27.41521 0

Need to look into this, looks like model projections show both income and wealth and reducing wave by wave whereas actual ELSA data shows fairly significant increase with each wave. First thing to check is the populations we use for the T-tests, its possible we are not using the exact same populations from each side, if the populations aren't the same a difference in the age distribution or something similar could cause this.

ld-archer commented 2 years ago

After meeting with Bryan, a few potential reasons for this poor performance have been identified:

  1. Not accounting for inflation
  2. Converting values to log when the raw values include negatives and zero
  3. Values are at the benefit unit level (i.e. within couple if in couple, or single if single)

To fix these problems, we need to:

ld-archer commented 2 years ago

Big improvement even just after removing the log values - see commit 1075c6d

ld-archer commented 2 years ago

Both inflation adjustment and benefit unit adjustment are now complete in enhancement/70-socioeconomic-vars (see commit ba28bff in that branch).

Inflation adjustment had big positive impact on the T-tests, however couple benefit unit adjustment seems to have undone some of that good work. Last step here to see what kind of impact we can make is to deal with the topcoded data from ELSA.

ELSA provides a special code of .t for total family income above £900,000 for anonymity purposes (see p599 of Harmonized ELSA codebook G.2). We will replace these values in reshape_long.do and crossvalidation_ELSA_core.do with the topcoded value (£900,000). In the future if this is not working or we want to be more clever about it, we can look into tobit models for predicting what these values would be over the threshold.