Closed ld-archer closed 3 years ago
Wealth and income both back in, most of the work was done as they were still in the input data, just had to re-add them to the model. Inflation multiplier will need to be included, but I think I can get away with only including it in reshape_long. Can do the calculations to account for inflation in that script then all input populations will inherit.
This has definitely not worked as we want to, see table below for Cross-validation T-tests for income and wealth.
variable | fem_mean_wave1 | elsa_mean_wave1 | p_value_wave1 | fem_mean_wave2 | elsa_mean_wave2 | p_value_wave2 | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave4 | elsa_mean_wave4 | p_value_wave4 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave6 | elsa_mean_wave6 | p_value_wave6 | fem_mean_wave7 | elsa_mean_wave7 | p_value_wave7 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 | fem_mean_wave9 | elsa_mean_wave9 | p_value_wave9 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total Family Wealth (thou.) | 248.5051 | 205.6091 | 0 | 186.0797 | 243.5422 | 0 | 193.403 | 276.8282 | 0 | 206.8602 | 280.1831 | 0 | 216.302 | 293.1316 | 0 | 225.9 | 314.7851 | 0 | 232.6775 | 347.3489 | 0 | 238.6429 | 397.9592 | 0 | 240.8246 | 441.0225 | 0 |
Total Family Income (thou.) | 19.73524 | 18.9182 | 0.00137 | 21.48945 | 19.49766 | 0 | 21.78878 | 20.9715 | 0.01026 | 22.14617 | 21.40408 | 0.01726 | 22.23113 | 21.83524 | 0.21191 | 22.31243 | 23.81972 | 1E-05 | 22.32754 | 25.30901 | 0 | 22.30042 | 26.44413 | 0 | 22.22436 | 27.41521 | 0 |
Need to look into this, looks like model projections show both income and wealth and reducing wave by wave whereas actual ELSA data shows fairly significant increase with each wave. First thing to check is the populations we use for the T-tests, its possible we are not using the exact same populations from each side, if the populations aren't the same a difference in the age distribution or something similar could cause this.
After meeting with Bryan, a few potential reasons for this poor performance have been identified:
To fix these problems, we need to:
Big improvement even just after removing the log values - see commit 1075c6d
Both inflation adjustment and benefit unit adjustment are now complete in enhancement/70-socioeconomic-vars (see commit ba28bff in that branch).
Inflation adjustment had big positive impact on the T-tests, however couple benefit unit adjustment seems to have undone some of that good work. Last step here to see what kind of impact we can make is to deal with the topcoded data from ELSA.
ELSA provides a special code of .t for total family income above £900,000 for anonymity purposes (see p599 of Harmonized ELSA codebook G.2). We will replace these values in reshape_long.do and crossvalidation_ELSA_core.do with the topcoded value (£900,000). In the future if this is not working or we want to be more clever about it, we can look into tobit models for predicting what these values would be over the threshold.
Next project is looking at links between socioeconomic status and health, so will need to add back variables related to wealth and income.
Vars to add:
Down the line might possibly consider disaggregating the combined variables but this will be a good start.