Improve T-tests - Githubissues

ld-archer commented 3 years ago

Currently, T-tests between the CV1 output and ELSA are showing poor correlation, and not the best relationship. We need to improve these to have more confidence in the model, which will probably involve changing some transition models, and identifying where we may have issues in the data.

First step to investigating these is to run the minimal models, and calculate T-tests between the minimal model output and ELSA. Then we can try to iteratively change the transition models or sample selections and see what improves/worsens the T-tests.

ld-archer commented 3 years ago

CV1

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Age at interview	67.9368	67.52273	0.01298	71.30965	71.19435	0.50952	75.96992	74.70768	0
Male	0.46029	0.46478	0.60558	0.45732	0.45324	0.67087	0.45317	0.45476	0.89073
White	0.96646	0.97296	0.02269	0.96597	0.97037	0.18125	0.96462	0.97228	0.04602

Minimal

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Age at interview	67.17301	67.52273	0.03573	69.99119	71.19435	0	74.22039	74.70768	0.00525
Male	0.45861	0.46478	0.4789	0.4553	0.45324	0.83081	0.44705	0.45476	0.50767
White	0.96532	0.97296	0.00772	0.96377	0.97037	0.04639	0.96118	0.97228	0.0043

These are demographic variables being compared. We can see that gender seems to be reasonably accurate in both scenarios, so we can leave that alone for now. Race was always going to be difficult considering the breakdown is 'white' vs 'non-white', but also this variable was missing a considerable proportion of data, so might not be reliable.

Age is important here, and something we really need to understand better. The problem is with the distribution, and how it is differently affected by related things, like mortality. If people die in the model at a quicker rate than in ELSA, we will most likely see the mean age from FEM output is lower than the mean age in ELSA. First thing to check then is the mortality model.

ld-archer commented 3 years ago

CV1

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave4	elsa_mean_wave4	p_value_wave4	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave6	elsa_mean_wave6	p_value_wave6	fem_mean_wave7	elsa_mean_wave7	p_value_wave7	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Died	0.02196	0.04115	0	0.02248	0.05095	0	0.02571	0.06018	0	0.03113	0.05517	0	0.0371	0	0	0.04326	0	0

Minimal

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave4	elsa_mean_wave4	p_value_wave4	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave6	elsa_mean_wave6	p_value_wave6	fem_mean_wave7	elsa_mean_wave7	p_value_wave7	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Died	0.04763	0.04115	0.05145	0.05268	0.05095	0.64978	0.05568	0.06018	0.27368	0.06062	0.05517	0.18228	0.06519	0	0	0.07199	0	0

We can tell that in CV1, the model is consistently underestimating mortality rates, which could be causing a lot more errors downstream. (Ignore wave 7-8, no deaths were reported in these waves yet). Let's see what effect we can have by tweaking the estimation model.

ld-archer commented 3 years ago

This paper from the lancet investigated the best predictors of 5 year mortality in UK Biobank participants. Amongst the most predictive was self-reported health. This is in ELSA, so I'm going to add this var in and reduce the model down to the bare minimum of chronic diseases and risk behaviours.

ld-archer commented 3 years ago

Small improvement in the mortality model after adding self-reported health var:

CV1

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave4	elsa_mean_wave4	p_value_wave4	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave6	elsa_mean_wave6	p_value_wave6	fem_mean_wave7	elsa_mean_wave7	p_value_wave7	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Died	0.02352	0.04115	0	0.02406	0.05095	0	0.02802	0.06018	0	0.03235	0.05517	0	0.03833	0	0	0.0451	0	0

Next will try to cut down predictors to just self reported health and demographics.

ld-archer commented 3 years ago

Update

Seen a significant improvement in the mortality model, due to a couple of changes:

Improved logbmi model MASSIVELY
- Previous logbmi was massively overestimating, see below for current T-tests
- Improvement made by NOT including l2logbmi as a predictor.
Dropped mortality model back down to minimal. Self-reported health & heart disease were both good predictors, but decided it was important to try and improve some of the chronic disease models, as they might influence mortality
Improved mortality models as mentioned above (still WIP)

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave4	elsa_mean_wave4	p_value_wave4	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave6	elsa_mean_wave6	p_value_wave6	fem_mean_wave7	elsa_mean_wave7	p_value_wave7	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Died	0.03427	0.04115	0.03607	0.04016	0.05095	0.00405	0.04378	0.06018	6E-05	0.04849	0.05517	0.09864	0.05431	0	0	0.05979	0	0
Age at interview	67.46056	67.52273	0.70889	68.98596	69.68942	9E-05	70.49633	71.19435	7E-05	71.99042	72.26346	0.11387	73.44135	73.47485	0.847	74.86029	74.70768	0.38166
Cancer ever	0.10529	0.08634	0.00013	0.12412	0.0972	0	0.14332	0.11917	0.00013	0.16537	0.12381	0	0.18708	0.1466	0	0.20995	0.16397	0
Diabetes ever	0.1134	0.0937	0.00012	0.12937	0.11321	0.00658	0.14451	0.12803	0.01107	0.15961	0.13755	0.00165	0.17455	0.15406	0.00933	0.18565	0.15097	4E-05
Heart disease ever	0.22072	0.17494	0	0.24355	0.1983	0	0.26794	0.22183	0	0.29371	0.22953	0	0.32054	0.25602	0	0.34867	0.28985	0
Hypertension ever	0.48354	0.4483	5E-05	0.51649	0.48162	0.00019	0.54569	0.50639	5E-05	0.57324	0.50793	0	0.59931	0.53135	0	0.62452	0.5384	0
Lung disease ever	0.08997	0.07187	7E-05	0.09918	0.07859	5E-05	0.10748	0.08631	0.00011	0.1182	0.08619	0	0.12758	0.09168	0	0.13645	0.08975	0
Stroke ever	0.05877	0.05037	0.02895	0.06486	0.06373	0.8043	0.07241	0.06743	0.30549	0.07907	0.07311	0.25971	0.08516	0.0833	0.75748	0.09339	0.08899	0.50812
BMI				28.10038	28.36905	0.01215				27.98555	28.31962	0.00399				27.89503	27.98109	0.50058

ld-archer commented 3 years ago

This issue is important but badly organised, going to close this and open new ones for groups/individual vars.

ld-archer / E_FEM

Improve T-tests #45