ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

Handle issues with infrequent drinkers and estimating consumption #84

Closed ld-archer closed 2 years ago

ld-archer commented 2 years ago

Infrequent drinkers

Using the same variable (scako), we can identify people who report drinking 'once or twice a week' or more, and have reported zero units in the week leading up to the survey. In the current version of alcbase, we are saying these people are the same as abstainers in terms of alcohol consumption, but this is obviously incorrect. We therefore need a cleverer way of handling the infrequent drinkers. One solution here is to collect people into groups based on their response to scako, and impute their weekly based on values from others within the group. We could also do this annually (by multiplying the weekly by 52), which would help with the really infrequent drinkers (i.e. less than once or twice a month). We would have to convert back to a weekly consumption. This is difficult, and would probably introduce bias or error where someone has a heavy day once per month which just happens to be in the week before the survey, but could be an improvement as long as we check it thoroughly (potentially using data from Understanding Society or some other survey that reports alcohol consumption).

_Originally posted by @ld-archer in https://github.com/ld-archer/E_FEM/issues/79#issuecomment-970367200_

ld-archer commented 2 years ago

Excellent illustration of the difference between consumption groups (as defined in scako):

. bys scako: sum r4alcbase_annual


-> scako = Refused

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 75 532.896 1001.203 0 5132.4


-> scako = Not appl

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 341 748.5255 1007.315 0 7280


-> scako = Almost e

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 1,054 1509.332 1155.241 0 14196


-> scako = Five or

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 372 1149.591 797.7879 0 6115.2


-> scako = Three or

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 777 887.1588 687.5993 0 5824


-> scako = Once or

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 1,371 514.7962 483.0598 0 5605.6


-> scako = Once or

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 427 273.4567 297.0073 0 3239.6


-> scako = Once eve

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 179 186.648 266.0044 0 2797.6


-> scako = Once or

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 222 83.03603 254.1812 0 2381.6


-> scako = Not at a

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 471 29.38938 249.2744 0 4201.6


-> scako = .

Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------- r4alcbase_~l | 1,769 820.0803 926.4758 0 8408.399

It's hard to see in this format unfortunately but the mean annual consumption of alcohol is highest in the group of daily drinkers, and gradually gets lower as the frequency of consumption in the past 12 months decreases. This is obvious in theory but seeing such a strong relationship in the data makes me more confident about imputing within groups. Same relationship is seen when those with alcbase == 0 are removed from the sample.

ld-archer commented 2 years ago

Unfortunately hotdecking in stata is not possible here due to some weird error that I cannot fix (too many variables specified r(103); despite only passing a single variable to be imputed).

So instead there are a couple of methods that can be employed. The first attempt will be multiple imputation in R using the mice package, which I will carry out and then check the distributions before and after imputation by comparing with similar variables in Understanding Society.

ld-archer commented 2 years ago

This will most likely be replaced as unnecessary due to switching the focus over to predicting individual drinks using poisson regression. Leaving this open however as I am not sure of that yet.

ld-archer commented 2 years ago

This work has been replaced now (or superseded may be more accurate). The new methodology predicts alcohol consumption in stages: 1 - Predict whether respondent has drank alcohol in previous 12 months (abstainer or not) 2 - Predict which category their consumption level would fall into (moderate, increasingRisk, highRisk) 3 - Predict their level of consumption in units from within these groups.

Closing issue as no longer important