Smoking and Drinking intensity indicator variables

ld-archer commented 3 years ago

Something that would benefit the smoking and drinking models is an indicator of intensity. We've included this before with the smokef (# cigs) and drink_stat vars, but neither of these worked particularly well. Now might be a good time to give that a try, going to spin this comment off into a new issue.

_Originally posted by @ld-archer in https://github.com/ld-archer/E_FEM/issues/52#issuecomment-734999584_

Smoking

[x] Find a reference for what values of smokef that would mean a heavy or light smoker
[x] Create smkint var (smoke intensity)

Drinking

Need to decide how to collate the drinking variables - drinkd, drinkn, and drinkwn
- Can we use drinkn? Only available for 2 waves
- Need to create this in reshape_long then drop the other vars
[ ] Create drinkint var

ld-archer commented 3 years ago

Smoking intensity

Found this table in this document, points out education link for intensity:	Degree	A-level	GCSE (D-G)	No formal Qualification
No. cigs / day	8.1	10.1	11.1	14.6

This paper reduces number of cigarettes into 3 categories:

High: >20 cigarettes/day
Medium: 10-19 cig/day
Low: <10 Seems like a good place to start

smkint var needs to only be transitioned or estimated if the person is a smoker.

ld-archer commented 3 years ago

smkint

[x] reshape_long
[x] transitions (base & min)
- Prediction model for smkint
- smkint predicting other models
[x] sample_selections
[x] Vars.cpp/.h
- Current
- Lag
- Dummys (1,2,3)
- Dummy lags
- probability
[x] measures_subpop
[x] settings.csv
[x] settings.txt (detailed_output_vars)
[x] cross-validation

The smokef var reports the number of cigarettes a person has in a day on average, but if they are not a smoker it records 0. We could use this to create a single variable for smoking binary and smoking intensity. Not sure if it is better, and would remove the need for smoke_start and smoke_stop. Need to think a bit more about it.

First will try just the categorical smkint var, with 3 levels:

Low: 1-9 cigs/day
Medium: 10-19 cigs/day
Heavy: >20 cigs/day

If this doesn't work so well, can try just having a binary for a heavy smoker. If smoke more than 20, heavy == 1. Then use this to predict in place of smkint. Also this has the bonus of being able to report the number and prevalence of heavy smoking.

ld-archer commented 3 years ago

Had to do #54 to get this to work. Now marriage status vars are working correctly, and smkint is getting there. smkint is working reasonably well in the minimal models, but something is causing a large overprediction in the cross-validation.

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave4	elsa_mean_wave4	p_value_wave4	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave6	elsa_mean_wave6	p_value_wave6	fem_mean_wave7	elsa_mean_wave7	p_value_wave7	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Smoking Intensity min	1.19393	1.20932	0.05474	1.16272	1.18807	0.00179	1.13562	1.16572	0.00013	1.11352	1.14912	1E-05	1.09488	1.13386	0	1.07946	1.10897	0.00011
Smoking Intensity CV	2.43276	1.20841	0	2.43729	1.18976	0	2.42055	1.16054	0	2.41251	1.1428	0	2.42127	1.12579	0	2.40997	1.09869	0

Cross validation T-tests are off by more than 1 level on average throughout, whereas at least minimal models are fairly close.

ld-archer commented 3 years ago

After speaking with Bryan, its clear that the non-smokers (smkint == 1) are very different from light smokers. We should definitely keep the smoke_start and smoke_stop models, and not transition smoking/non-smoking using smkint. smkint should then be estimated and applied only on those who smoke, BUT should be assigned for everyone. So non-smokers = smkint == 0?

[x] Check everyone has smkint value
[x] Only transition for those who smoke
[x] Make sure Vars.cpp/.h is correct (vars linked correctly)
[x] Do accounting in the HealthModule
- Focus on those who are not smoking rather than those who are smoking. The transition model should assign value of smkint for smoke_start'ers, not the accounting
- Make sure non-smokers are smkint == 0

ld-archer commented 3 years ago

Update smkint works! Sort of... The minimal models are predicting smkint well until wave 6, where the gap widens too far. The simulated (FEM) mean is smaller than the measured (ELSA) mean.

variable	fem_mean_wave3	elsa_mean_wave3	p_value_wave3	fem_mean_wave4	elsa_mean_wave4	p_value_wave4	fem_mean_wave5	elsa_mean_wave5	p_value_wave5	fem_mean_wave6	elsa_mean_wave6	p_value_wave6	fem_mean_wave7	elsa_mean_wave7	p_value_wave7	fem_mean_wave8	elsa_mean_wave8	p_value_wave8
Smoking Intensity (min)	0.20567	0.20887	0.68866	0.17369	0.18797	0.0778	0.14889	0.16257	0.07925	0.12535	0.14615	0.00735	0.10592	0.13269	0.00066	0.08932	0.10897	0.00972
Smoking Intensity (CV)	0.32858	0.2078	0	0.2943	0.18976	0	0.27035	0.15803	0	0.25465	0.1388	0	0.23535	0.1251	0	0.21285	0.09869	0

This is still a WIP as the CV models are not great. Aside from tweaking the transition models (they are very basic right now, just above minimal), we also could include a couple of other variables just as predictors for this (and maybe drinking intensity) variable. Also, converting any monetary variables (income, pension, wealth) into logs may help too. This needs to happen anyway so may as well do it now.

Add vars:
- [x] loneliness
- [x] unemployed
- [ ] job type (see figure 3 in this report)
- [ ] Whether owns home
[x] Convert money vars into logs (as described in #55)

ld-archer commented 3 years ago

Drinkint

This needs to be informed from a combination of drinkd (days/week drinks alcohol), drinkwn (avg. drinks/week), and drinkn (max drinks/day in last week). drinkn is a poor variable to try to use, as it was only asked for 2 waves (2 & 3), and it asks the MAXIMUM drinks/day, whereas drinkwn asks the average. We therefore can't compare these, or use one to impute the other. What we can do however, is to use these 2 vars in combination with drinkd.

According to the Institute of Alcohol Studies, older people's drinking habits can be described as 'very little, very often'. Also, this release from the ONS says:

Since 2005, teetotalism has increased for those aged 16 to 44 years and fallen for those aged 65 and over

From the same report, definition of binge drinking:

Binge drinking is defined as males who exceeded 8 units of alcohol on their heaviest drinking day, and females who exceeded 6 units on their heaviest drinking day.

Also from that report, proportion of adults who drink by income level: Figure 7 Proportion (%) of adults who drank alcohol by income, Great Britain, 2017 This is pretty interesting! It shows that adults in Britain are more likely to drink as their income increases. This is in contrast I believe to american data, which if I remember correclty, drinking was more common among low earners. I would need to check that though.

ld-archer commented 3 years ago

More info on drinking to inform the drinkint variable.

From Alcohol Change UK:

Around 20% of the population don’t drink at all – and this figure is increasing among young people in particular. Among those who do drink, patterns of consumption vary enormously:

higher earners are more likely to drink than those on lower incomes

older people are more likely to drink regularly

men are more likely to ‘binge drink’ than women (though this is less the case among the young)

From drinkaware.org:

The Chief Medical Officers' guideline for both men and women states that:

To keep health risks from alcohol to a low level it is safest not to drink more than 14 units a week on a regular basis

From the NHS about binge drinking:

Binge drinking usually refers to drinking lots of alcohol in a short space of time or drinking to get drunk. In the UK, binge drinking is drinking more than:

8 units of alcohol in a single session for men

6 units of alcohol in a single session for women

I think this is enough to start.

Plan

As all the ELSA variables are reported in 'drinks' and not units, we will need to map these across. I think a sensible way to do this is to assume 1 drink == 2 units, as most drinks range from 1 to 3 units.

Using all three drinking variables, we will generate the drinkint variable. These are the rules we will follow:

r[2-3]drinkn_e
- Number of drinks respondent drank on the day when they drank the most in the previous week
- If drinkn_e > 4 (men) or > 3 (women), we will tag the respondent as a binge drinker
r[4-8]drinkwn_e
- Number of drinks in the last week
- If drinkwn_e > 7, we will tag the respondent as a heavy drinker
drinkd_e
- No. of days R drank alcohol in the past week
- This one is more tricky, as lots of the information I have found has stated that older generations drink smaller quantities more frequently
- Because of this, I'm going to do some trial and error to see if we can use this var.

With these indicators (binge drinker, heavy drinker) we can create the drinkint variable. I'm starting to think that it would work better as a binary variable, because of the limited data. First attempt then will be: heavy_drinker = binge_drinker | heavy_drinker We might use drinkd_e too, and add for example heavy_drink = drinkd_e > 5

ld-archer commented 3 years ago

The work done on this branch was merged in PR #60. Leaving the issue open however as more work needs to be done to improve them.

ld-archer commented 3 years ago

Have now replaced the earlier work with much more simple intensity variables.

We now have a binary variable for both smoking and drinking, defined as such:

Smoking

heavy_smoker = More than 15 cigs per day

Drinking

problem_drinker = More than 10 drinks per week OR More than 6 drinks on heaviest day in past week

ld-archer / E_FEM