Closed ld-archer closed 3 years ago
CV1
variable | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 |
---|---|---|---|---|---|---|---|---|---|
Age at interview | 67.9368 | 67.52273 | 0.01298 | 71.30965 | 71.19435 | 0.50952 | 75.96992 | 74.70768 | 0 |
Male | 0.46029 | 0.46478 | 0.60558 | 0.45732 | 0.45324 | 0.67087 | 0.45317 | 0.45476 | 0.89073 |
White | 0.96646 | 0.97296 | 0.02269 | 0.96597 | 0.97037 | 0.18125 | 0.96462 | 0.97228 | 0.04602 |
Minimal
variable | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 |
---|---|---|---|---|---|---|---|---|---|
Age at interview | 67.17301 | 67.52273 | 0.03573 | 69.99119 | 71.19435 | 0 | 74.22039 | 74.70768 | 0.00525 |
Male | 0.45861 | 0.46478 | 0.4789 | 0.4553 | 0.45324 | 0.83081 | 0.44705 | 0.45476 | 0.50767 |
White | 0.96532 | 0.97296 | 0.00772 | 0.96377 | 0.97037 | 0.04639 | 0.96118 | 0.97228 | 0.0043 |
These are demographic variables being compared. We can see that gender seems to be reasonably accurate in both scenarios, so we can leave that alone for now. Race was always going to be difficult considering the breakdown is 'white' vs 'non-white', but also this variable was missing a considerable proportion of data, so might not be reliable.
Age is important here, and something we really need to understand better. The problem is with the distribution, and how it is differently affected by related things, like mortality. If people die in the model at a quicker rate than in ELSA, we will most likely see the mean age from FEM output is lower than the mean age in ELSA. First thing to check then is the mortality model.
CV1
variable | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave4 | elsa_mean_wave4 | p_value_wave4 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave6 | elsa_mean_wave6 | p_value_wave6 | fem_mean_wave7 | elsa_mean_wave7 | p_value_wave7 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Died | 0.02196 | 0.04115 | 0 | 0.02248 | 0.05095 | 0 | 0.02571 | 0.06018 | 0 | 0.03113 | 0.05517 | 0 | 0.0371 | 0 | 0 | 0.04326 | 0 | 0 |
Minimal
variable | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave4 | elsa_mean_wave4 | p_value_wave4 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave6 | elsa_mean_wave6 | p_value_wave6 | fem_mean_wave7 | elsa_mean_wave7 | p_value_wave7 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Died | 0.04763 | 0.04115 | 0.05145 | 0.05268 | 0.05095 | 0.64978 | 0.05568 | 0.06018 | 0.27368 | 0.06062 | 0.05517 | 0.18228 | 0.06519 | 0 | 0 | 0.07199 | 0 | 0 |
We can tell that in CV1, the model is consistently underestimating mortality rates, which could be causing a lot more errors downstream. (Ignore wave 7-8, no deaths were reported in these waves yet). Let's see what effect we can have by tweaking the estimation model.
This paper from the lancet investigated the best predictors of 5 year mortality in UK Biobank participants. Amongst the most predictive was self-reported health. This is in ELSA, so I'm going to add this var in and reduce the model down to the bare minimum of chronic diseases and risk behaviours.
Small improvement in the mortality model after adding self-reported health var:
CV1
variable | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave4 | elsa_mean_wave4 | p_value_wave4 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave6 | elsa_mean_wave6 | p_value_wave6 | fem_mean_wave7 | elsa_mean_wave7 | p_value_wave7 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Died | 0.02352 | 0.04115 | 0 | 0.02406 | 0.05095 | 0 | 0.02802 | 0.06018 | 0 | 0.03235 | 0.05517 | 0 | 0.03833 | 0 | 0 | 0.0451 | 0 | 0 |
Next will try to cut down predictors to just self reported health and demographics.
Update
Seen a significant improvement in the mortality model, due to a couple of changes:
variable | fem_mean_wave3 | elsa_mean_wave3 | p_value_wave3 | fem_mean_wave4 | elsa_mean_wave4 | p_value_wave4 | fem_mean_wave5 | elsa_mean_wave5 | p_value_wave5 | fem_mean_wave6 | elsa_mean_wave6 | p_value_wave6 | fem_mean_wave7 | elsa_mean_wave7 | p_value_wave7 | fem_mean_wave8 | elsa_mean_wave8 | p_value_wave8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Died | 0.03427 | 0.04115 | 0.03607 | 0.04016 | 0.05095 | 0.00405 | 0.04378 | 0.06018 | 6E-05 | 0.04849 | 0.05517 | 0.09864 | 0.05431 | 0 | 0 | 0.05979 | 0 | 0 |
Age at interview | 67.46056 | 67.52273 | 0.70889 | 68.98596 | 69.68942 | 9E-05 | 70.49633 | 71.19435 | 7E-05 | 71.99042 | 72.26346 | 0.11387 | 73.44135 | 73.47485 | 0.847 | 74.86029 | 74.70768 | 0.38166 |
Cancer ever | 0.10529 | 0.08634 | 0.00013 | 0.12412 | 0.0972 | 0 | 0.14332 | 0.11917 | 0.00013 | 0.16537 | 0.12381 | 0 | 0.18708 | 0.1466 | 0 | 0.20995 | 0.16397 | 0 |
Diabetes ever | 0.1134 | 0.0937 | 0.00012 | 0.12937 | 0.11321 | 0.00658 | 0.14451 | 0.12803 | 0.01107 | 0.15961 | 0.13755 | 0.00165 | 0.17455 | 0.15406 | 0.00933 | 0.18565 | 0.15097 | 4E-05 |
Heart disease ever | 0.22072 | 0.17494 | 0 | 0.24355 | 0.1983 | 0 | 0.26794 | 0.22183 | 0 | 0.29371 | 0.22953 | 0 | 0.32054 | 0.25602 | 0 | 0.34867 | 0.28985 | 0 |
Hypertension ever | 0.48354 | 0.4483 | 5E-05 | 0.51649 | 0.48162 | 0.00019 | 0.54569 | 0.50639 | 5E-05 | 0.57324 | 0.50793 | 0 | 0.59931 | 0.53135 | 0 | 0.62452 | 0.5384 | 0 |
Lung disease ever | 0.08997 | 0.07187 | 7E-05 | 0.09918 | 0.07859 | 5E-05 | 0.10748 | 0.08631 | 0.00011 | 0.1182 | 0.08619 | 0 | 0.12758 | 0.09168 | 0 | 0.13645 | 0.08975 | 0 |
Stroke ever | 0.05877 | 0.05037 | 0.02895 | 0.06486 | 0.06373 | 0.8043 | 0.07241 | 0.06743 | 0.30549 | 0.07907 | 0.07311 | 0.25971 | 0.08516 | 0.0833 | 0.75748 | 0.09339 | 0.08899 | 0.50812 |
BMI | 28.10038 | 28.36905 | 0.01215 | 27.98555 | 28.31962 | 0.00399 | 27.89503 | 27.98109 | 0.50058 |
This issue is important but badly organised, going to close this and open new ones for groups/individual vars.
Currently, T-tests between the CV1 output and ELSA are showing poor correlation, and not the best relationship. We need to improve these to have more confidence in the model, which will probably involve changing some transition models, and identifying where we may have issues in the data.
First step to investigating these is to run the minimal models, and calculate T-tests between the minimal model output and ELSA. Then we can try to iteratively change the transition models or sample selections and see what improves/worsens the T-tests.