Improve risk behaviours in the model

ld-archer commented 2 years ago

After meeting with Alan Brennan today (3/11/21) we came up with a few ways to improve risk behaviours in the model.

Alcohol

For proper estimation and use of alcohol in e.g. comparison studies, it is important that we track units of alcohol consumption rather than drinks as is given in the harmonised ELSA dataset. Therefore we can either find a way to calculate/estimate units from the data we have, or use data from the questionnaire directly and attempt to add that in to our harmonised dataset.

[x] Find if anyone has managed to estimate unit
[x] Check out the alcbase variable and see if we can use that in our model (or derive something from it)
[x] See how ELSA already treats units (or derives them) and check their work

Smoking

Smoking currently is handled in a fairly standard way, except for the way we model intensity. The current heavy_smoker var would probably be better replaced with a simple continuous variable of number of cigarettes smoked per day (which is already include in ELSA with the smokef var. We also need to look into having a variable for the time since quitting, which is important in a number of smoking related illnesses (probably moreso than the smoking intensity). For this we will need some kind of mechanism inside the model, but will most likely need some external data for generating this distribution amongst smokers in the baseline data (for input populations and estimating transitions).

[x] Add smokef to the model alongside heavy_smoker
[ ] Find a way to track the waves since quit smoking in the model
[ ] Find some external data to impute the years since quit smoking in baseline and transition data.

ld-archer commented 2 years ago

alcbase

This variable only exists in wave 0 (i.e. from the HSE data) and hasn't been recorded in ELSA.

Unit estimation

This paper (Iparraguirre 2015) estimates the units drank based on information from the individual wave data files that is not then included in the harmonised data. Specifically, three variables are included from wave 4 onwards:

pints of beer consumed in the past week
glasses of wine consumed in the past week
measures of spirit consumed in the past week

The paper then uses 3 different calculators to work out the number of units from these three measures NHS, General Lifestyle Survey (GLS), and drinkaware. I think the NHS guidelines would be a good place to start.

In practical terms then, to use this information we would need another script to run during reshape_long.do, probably best to do this before the data is reshaped as I think it makes more sense that way. We would need to load in the datafiles 1 by 1 (which may mean copying the wave 4-9 wave_x_elsa_data_v3.dta files into input_data/, haven't decided on this yet), then use the information for each idauniq to calculate number of units and add this onto the harmonised wide format dataset. Makes sense to open a new issue for this problem.

ld-archer commented 2 years ago

New idea for this work: Instead of calculating units consumed in past week for each wave and trying to predict that, will now focus on predicting the number of each individual drink consumed in past week for each wave using a poisson regression and calculating units after prediction. Hoping this will help to keep the more extreme consumption values and maintain the heaviest drinkers better than predicting units alone. Also has the benefit of allowing us to tailor the drinks models a bit better, as different sets of vars may be better suited to predicting different drinks.

[ ] Remove focus on alcbase and start predicting each individual drink type

ld-archer / E_FEM