ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

Improve T-tests #45

Closed ld-archer closed 3 years ago

ld-archer commented 3 years ago

Currently, T-tests between the CV1 output and ELSA are showing poor correlation, and not the best relationship. We need to improve these to have more confidence in the model, which will probably involve changing some transition models, and identifying where we may have issues in the data.

First step to investigating these is to run the minimal models, and calculate T-tests between the minimal model output and ELSA. Then we can try to iteratively change the transition models or sample selections and see what improves/worsens the T-tests.

ld-archer commented 3 years ago

CV1

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Age at interview 67.9368 67.52273 0.01298 71.30965 71.19435 0.50952 75.96992 74.70768 0
Male 0.46029 0.46478 0.60558 0.45732 0.45324 0.67087 0.45317 0.45476 0.89073
White 0.96646 0.97296 0.02269 0.96597 0.97037 0.18125 0.96462 0.97228 0.04602

Minimal

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Age at interview 67.17301 67.52273 0.03573 69.99119 71.19435 0 74.22039 74.70768 0.00525
Male 0.45861 0.46478 0.4789 0.4553 0.45324 0.83081 0.44705 0.45476 0.50767
White 0.96532 0.97296 0.00772 0.96377 0.97037 0.04639 0.96118 0.97228 0.0043

These are demographic variables being compared. We can see that gender seems to be reasonably accurate in both scenarios, so we can leave that alone for now. Race was always going to be difficult considering the breakdown is 'white' vs 'non-white', but also this variable was missing a considerable proportion of data, so might not be reliable.

Age is important here, and something we really need to understand better. The problem is with the distribution, and how it is differently affected by related things, like mortality. If people die in the model at a quicker rate than in ELSA, we will most likely see the mean age from FEM output is lower than the mean age in ELSA. First thing to check then is the mortality model.

ld-archer commented 3 years ago

CV1

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Died 0.02196 0.04115 0 0.02248 0.05095 0 0.02571 0.06018 0 0.03113 0.05517 0 0.0371 0 0 0.04326 0 0

Minimal

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Died 0.04763 0.04115 0.05145 0.05268 0.05095 0.64978 0.05568 0.06018 0.27368 0.06062 0.05517 0.18228 0.06519 0 0 0.07199 0 0

We can tell that in CV1, the model is consistently underestimating mortality rates, which could be causing a lot more errors downstream. (Ignore wave 7-8, no deaths were reported in these waves yet). Let's see what effect we can have by tweaking the estimation model.

ld-archer commented 3 years ago

This paper from the lancet investigated the best predictors of 5 year mortality in UK Biobank participants. Amongst the most predictive was self-reported health. This is in ELSA, so I'm going to add this var in and reduce the model down to the bare minimum of chronic diseases and risk behaviours.

ld-archer commented 3 years ago

Small improvement in the mortality model after adding self-reported health var:

CV1

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Died 0.02352 0.04115 0 0.02406 0.05095 0 0.02802 0.06018 0 0.03235 0.05517 0 0.03833 0 0 0.0451 0 0

Next will try to cut down predictors to just self reported health and demographics.

ld-archer commented 3 years ago

Update

Seen a significant improvement in the mortality model, due to a couple of changes:

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Died 0.03427 0.04115 0.03607 0.04016 0.05095 0.00405 0.04378 0.06018 6E-05 0.04849 0.05517 0.09864 0.05431 0 0 0.05979 0 0
Age at interview 67.46056 67.52273 0.70889 68.98596 69.68942 9E-05 70.49633 71.19435 7E-05 71.99042 72.26346 0.11387 73.44135 73.47485 0.847 74.86029 74.70768 0.38166
Cancer ever 0.10529 0.08634 0.00013 0.12412 0.0972 0 0.14332 0.11917 0.00013 0.16537 0.12381 0 0.18708 0.1466 0 0.20995 0.16397 0
Diabetes ever 0.1134 0.0937 0.00012 0.12937 0.11321 0.00658 0.14451 0.12803 0.01107 0.15961 0.13755 0.00165 0.17455 0.15406 0.00933 0.18565 0.15097 4E-05
Heart disease ever 0.22072 0.17494 0 0.24355 0.1983 0 0.26794 0.22183 0 0.29371 0.22953 0 0.32054 0.25602 0 0.34867 0.28985 0
Hypertension ever 0.48354 0.4483 5E-05 0.51649 0.48162 0.00019 0.54569 0.50639 5E-05 0.57324 0.50793 0 0.59931 0.53135 0 0.62452 0.5384 0
Lung disease ever 0.08997 0.07187 7E-05 0.09918 0.07859 5E-05 0.10748 0.08631 0.00011 0.1182 0.08619 0 0.12758 0.09168 0 0.13645 0.08975 0
Stroke ever 0.05877 0.05037 0.02895 0.06486 0.06373 0.8043 0.07241 0.06743 0.30549 0.07907 0.07311 0.25971 0.08516 0.0833 0.75748 0.09339 0.08899 0.50812
BMI       28.10038 28.36905 0.01215       27.98555 28.31962 0.00399       27.89503 27.98109 0.50058
ld-archer commented 3 years ago

This issue is important but badly organised, going to close this and open new ones for groups/individual vars.