ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

Smoking and Drinking intensity indicator variables #53

Closed ld-archer closed 3 years ago

ld-archer commented 3 years ago

Something that would benefit the smoking and drinking models is an indicator of intensity. We've included this before with the smokef (# cigs) and drink_stat vars, but neither of these worked particularly well. Now might be a good time to give that a try, going to spin this comment off into a new issue.

_Originally posted by @ld-archer in https://github.com/ld-archer/E_FEM/issues/52#issuecomment-734999584_

Smoking
Drinking
ld-archer commented 3 years ago
Smoking intensity
Found this table in this document, points out education link for intensity:   Degree A-level GCSE (D-G) No formal Qualification
No. cigs / day 8.1 10.1 11.1 14.6

This paper reduces number of cigarettes into 3 categories:

smkint var needs to only be transitioned or estimated if the person is a smoker.

ld-archer commented 3 years ago

smkint

The smokef var reports the number of cigarettes a person has in a day on average, but if they are not a smoker it records 0. We could use this to create a single variable for smoking binary and smoking intensity. Not sure if it is better, and would remove the need for smoke_start and smoke_stop. Need to think a bit more about it.

First will try just the categorical smkint var, with 3 levels:

If this doesn't work so well, can try just having a binary for a heavy smoker. If smoke more than 20, heavy == 1. Then use this to predict in place of smkint. Also this has the bonus of being able to report the number and prevalence of heavy smoking.

ld-archer commented 3 years ago

Had to do #54 to get this to work. Now marriage status vars are working correctly, and smkint is getting there. smkint is working reasonably well in the minimal models, but something is causing a large overprediction in the cross-validation.

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Smoking Intensity min 1.19393 1.20932 0.05474 1.16272 1.18807 0.00179 1.13562 1.16572 0.00013 1.11352 1.14912 1E-05 1.09488 1.13386 0 1.07946 1.10897 0.00011
Smoking Intensity CV 2.43276 1.20841 0 2.43729 1.18976 0 2.42055 1.16054 0 2.41251 1.1428 0 2.42127 1.12579 0 2.40997 1.09869 0

Cross validation T-tests are off by more than 1 level on average throughout, whereas at least minimal models are fairly close.

ld-archer commented 3 years ago

After speaking with Bryan, its clear that the non-smokers (smkint == 1) are very different from light smokers. We should definitely keep the smoke_start and smoke_stop models, and not transition smoking/non-smoking using smkint. smkint should then be estimated and applied only on those who smoke, BUT should be assigned for everyone. So non-smokers = smkint == 0?

ld-archer commented 3 years ago

Update smkint works! Sort of... The minimal models are predicting smkint well until wave 6, where the gap widens too far. The simulated (FEM) mean is smaller than the measured (ELSA) mean.

variable fem_mean_wave3 elsa_mean_wave3 p_value_wave3 fem_mean_wave4 elsa_mean_wave4 p_value_wave4 fem_mean_wave5 elsa_mean_wave5 p_value_wave5 fem_mean_wave6 elsa_mean_wave6 p_value_wave6 fem_mean_wave7 elsa_mean_wave7 p_value_wave7 fem_mean_wave8 elsa_mean_wave8 p_value_wave8
Smoking Intensity (min) 0.20567 0.20887 0.68866 0.17369 0.18797 0.0778 0.14889 0.16257 0.07925 0.12535 0.14615 0.00735 0.10592 0.13269 0.00066 0.08932 0.10897 0.00972
Smoking Intensity (CV) 0.32858 0.2078 0 0.2943 0.18976 0 0.27035 0.15803 0 0.25465 0.1388 0 0.23535 0.1251 0 0.21285 0.09869 0

This is still a WIP as the CV models are not great. Aside from tweaking the transition models (they are very basic right now, just above minimal), we also could include a couple of other variables just as predictors for this (and maybe drinking intensity) variable. Also, converting any monetary variables (income, pension, wealth) into logs may help too. This needs to happen anyway so may as well do it now.

ld-archer commented 3 years ago

Drinkint

This needs to be informed from a combination of drinkd (days/week drinks alcohol), drinkwn (avg. drinks/week), and drinkn (max drinks/day in last week). drinkn is a poor variable to try to use, as it was only asked for 2 waves (2 & 3), and it asks the MAXIMUM drinks/day, whereas drinkwn asks the average. We therefore can't compare these, or use one to impute the other. What we can do however, is to use these 2 vars in combination with drinkd.

According to the Institute of Alcohol Studies, older people's drinking habits can be described as 'very little, very often'. Also, this release from the ONS says:

Since 2005, teetotalism has increased for those aged 16 to 44 years and fallen for those aged 65 and over

From the same report, definition of binge drinking:

Binge drinking is defined as males who exceeded 8 units of alcohol on their heaviest drinking day, and females who exceeded 6 units on their heaviest drinking day.

Also from that report, proportion of adults who drink by income level: Figure 7 Proportion (%) of adults who drank alcohol by income, Great Britain, 2017 This is pretty interesting! It shows that adults in Britain are more likely to drink as their income increases. This is in contrast I believe to american data, which if I remember correclty, drinking was more common among low earners. I would need to check that though.

ld-archer commented 3 years ago

More info on drinking to inform the drinkint variable.

From Alcohol Change UK:

Around 20% of the population don’t drink at all – and this figure is increasing among young people in particular. Among those who do drink, patterns of consumption vary enormously:

  • higher earners are more likely to drink than those on lower incomes
  • older people are more likely to drink regularly
  • men are more likely to ‘binge drink’ than women (though this is less the case among the young)

From drinkaware.org:

The Chief Medical Officers' guideline for both men and women states that:

  • To keep health risks from alcohol to a low level it is safest not to drink more than 14 units a week on a regular basis

From the NHS about binge drinking:

Binge drinking usually refers to drinking lots of alcohol in a short space of time or drinking to get drunk. In the UK, binge drinking is drinking more than:

  • 8 units of alcohol in a single session for men
  • 6 units of alcohol in a single session for women

I think this is enough to start.

Plan

As all the ELSA variables are reported in 'drinks' and not units, we will need to map these across. I think a sensible way to do this is to assume 1 drink == 2 units, as most drinks range from 1 to 3 units.

Using all three drinking variables, we will generate the drinkint variable. These are the rules we will follow:

With these indicators (binge drinker, heavy drinker) we can create the drinkint variable. I'm starting to think that it would work better as a binary variable, because of the limited data. First attempt then will be: heavy_drinker = binge_drinker | heavy_drinker We might use drinkd_e too, and add for example heavy_drink = drinkd_e > 5

ld-archer commented 3 years ago

The work done on this branch was merged in PR #60. Leaving the issue open however as more work needs to be done to improve them.

ld-archer commented 3 years ago

Have now replaced the earlier work with much more simple intensity variables.

We now have a binary variable for both smoking and drinking, defined as such:

Smoking

heavy_smoker = More than 15 cigs per day

Drinking

problem_drinker = More than 10 drinks per week OR More than 6 drinks on heaviest day in past week