ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

Calculate units of alcohol from wave-specific data #79

Closed ld-archer closed 2 years ago

ld-archer commented 2 years ago

Unit estimation

This paper (Iparraguirre 2015) estimates the units drank based on information from the individual wave data files that is not then included in the harmonised data. Specifically, three variables are included from wave 4 onwards:

The paper then uses 3 different calculators to work out the number of units from these three measures NHS, General Lifestyle Survey (GLS), and drinkaware. I think the NHS guidelines would be a good place to start.

In practical terms then, to use this information we would need another script to run during reshape_long.do, probably best to do this before the data is reshaped as I think it makes more sense that way. We would need to load in the datafiles 1 by 1 (which may mean copying the wave 4-9 wave_x_elsa_data_v3.dta files into input_data/, haven't decided on this yet), then use the information for each idauniq to calculate number of units and add this onto the harmonised wide format dataset. Makes sense to open a new issue for this problem.

_Originally posted by @ld-archer in https://github.com/ld-archer/E_FEM/issues/77#issuecomment-959716849_

ld-archer commented 2 years ago
ld-archer commented 2 years ago

Another thing to think about - this information is collected for the timespan of the week preceding the survey from waves 4 onwards, whereas waves 2&3 have this information for the heaviest day in the past week. I think the best way to deal with this is therefore to calculate the units from wave 4+ over the past week, then impute this variable for waves 1-3. Either using hotdecking or multiple imputation in stata.

ld-archer commented 2 years ago

Produce some summary stats from the alcbase variable:

Produce regression estimates both WITH and WITHOUT including l2alcbase in prediction of alcbase. Look at coefficients here, I am assuming that the coefficient of l2alcbase will be close to 1 for alcbase but might be interesting.

Send this stuff to Bryan.

ld-archer commented 2 years ago

Need to do accounting of the alcbase variables in HealthModule.cpp

This means things like making sure drink == 1 if alcbase > 0. Check out the old code for smkint (and remove that at the same time) as well as exstat and adlstat.

ld-archer commented 2 years ago

Some progress since last update but also a lot of new questions raised. Additional wave specific variables with information about alcohol consumption have been included in the process to better categorise those with missing data on drinks, however this is only truly helpful in the identification of abstainers and has raised more issues. Those issues being:

Some answers and thoughts:

Abstainers

True abstainers are hard to identify in ELSA, but other variables can help to solve this problem. The scako variable reports how often a respondent has consumed alcohol over the past 12 months, ranging from 'not at all' to 'every day'. Those who answered 'not at all' can be considered true abstainers, whilst decisions have to be made regarding some groups (such as those who drink once or twice a year, up to once or twice a month).

Infrequent drinkers

Using the same variable (scako), we can identify people who report drinking 'once or twice a week' or more, and have reported zero units in the week leading up to the survey. In the current version of alcbase, we are saying these people are the same as abstainers in terms of alcohol consumption, but this is obviously incorrect. We therefore need a cleverer way of handling the infrequent drinkers. One solution here is to collect people into groups based on their response to scako, and impute their weekly based on values from others within the group. We could also do this annually (by multiplying the weekly by 52), which would help with the really infrequent drinkers (i.e. less than once or twice a month). We would have to convert back to a weekly consumption. This is difficult, and would probably introduce bias or error where someone has a heavy day once per month which just happens to be in the week before the survey, but could be an improvement as long as we check it thoroughly (potentially using data from Understanding Society or some other survey that reports alcohol consumption).

Self-completion vs Core

Alcohol consumption is only asked in the self-completion questionnaire from wave 4 onwards (where the good data on this is). We need to check that this is not introducing too much bias, or at least be aware of what bias it does introduce. From very quick checks, response to the self-completion questionnaire is higher for younger age groups, higher education levels, married people, healthier people (srh of Good to Excellent), and non-disabled (anyadl == 0). We need to get a good idea of the bias involved so would probably be good to make an R Notebook detailing the differences between the core and self-completion samples.

ld-archer commented 2 years ago

Improve Prediction of Alcbase

Prediction of alcbase currently shows some regression to the mean. High risk drinkers become less common over time until they completely disappear. Including l2alcbase in the transition model is a good start, but there is more to do. First step would be to turn the prediction of consumption into a 2 stage process, starting with whether the simulant drinks alcohol at all, and then how much. Another way to keep hold of the long right tail is to add 'knots' in the prediction of alcbase, similar to the work we did with BMI (BMI has a 'knot' at BMI == 30 to maintain those who are above this point). A good place to start in terms of knot points is the categories of abstainer, moderate, increasingRisk and highRisk, but these are gender specific values so will need to do something about that. Either have each term in the model be an interaction between male and term (i.e. male * hsless) or the better way is to have a separate model for each gender (more complicated but cleaner).

ld-archer commented 2 years ago

This idea has now been superceded slightly. Instead of calculating alcbase for each wave and predicting this variable, I am going to try to include the variables for consumption of each individual drink type (beer, wine, spirits), and predict a value for each of these independently using poisson regression. The combined units drank will then be calculated after prediction. Therefore I'm closing this issue.