ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

improve predictors of loneliness #116

Closed ld-archer closed 1 year ago

ld-archer commented 1 year ago

Have an ordinal logistic regression model running in R which I can use as a test bed for the models. The R version of the model is not exactly the same as the stata version (R is MASS::polr - proportional odds logistic regression, whereas stata is oprobit regression. Couldn't find a package to run oprobit logistic regression in R), and the coefficients are not exactly the same but the sign is always correct so I'm going to treat them as analogous.

ld-archer commented 1 year ago

For Reference:

Minimal

STATA:
oprobit
lnly
male                     -0.19875
l2age65l                 -0.00822
l2age6574                 0.00094
l2age75p                  0.02915
cut1                     -0.07045
cut2                      1.18903

R:
Coefficients:
                     Value Std. Error  t value
male1            -0.331415    0.01825 -18.1638
scale(l2age65l)  -0.058227    0.01176  -4.9521
scale(l2age6574)  0.007222    0.01349   0.5355
scale(l2age75p)   0.187344    0.01210  15.4884

Intercepts:
    Value    Std. Error t value 
1|2   0.5964   0.0117    50.9754
2|3   2.9557   0.0217   136.1393

Residual Deviance: 85242.67 
AIC: 85254.67 
(27510 observations deleted due to missingness)

Full

STATA:
oprobit
lnly
male                     -0.15979
white                    -0.43794
hsless                    0.10378
college                  -0.06684
l2age65l                 -0.00674
l2age6574                -0.00700
l2age75p                  0.03091
atotb                    -0.00000
l2employed                0.01459
l2inactive                0.22831
l2physact                 0.04646
cut1                     -0.30069
cut2                      1.02784

R:
Coefficients:
                    Value Std. Error  t value
male1            -0.26757    0.02283 -11.7214
white            -0.75271    0.06838 -11.0081
hsless1           0.15896    0.02718   5.8481
college1         -0.09253    0.03067  -3.0168
scale(l2age65l)  -0.04478    0.01438  -3.1135
scale(l2age6574) -0.05080    0.01865  -2.7235
scale(l2age75p)   0.20107    0.01881  10.6902
scale(atotb)     -0.17677    0.01809  -9.7742
l2employed1       0.02870    0.03356   0.8551
l2inactive1       0.37082    0.04132   8.9740
l2physact1        0.06823    0.03234   2.1100

Intercepts:
    Value    Std. Error t value 
1|2   0.1524   0.0775     1.9664
2|3   2.7118   0.0813    33.3527

Residual Deviance: 55678.44 
AIC: 55704.44 
(43839 observations deleted due to missingness)
ld-archer commented 1 year ago

Massive meta review: Cohen-Mansfield et al (2015)

The list below is predictors AND correlates. This means that some of these variables may not only predict loneliness, but may also/instead be predicted by loneliness. Need to be careful here. Also I am not including variables in this list that I have no chance of adding due to either lack of variables in ELSA or other complications.

Variables:

Shortlist of variables I already have or can easily test:

Now to fit model.

ld-archer commented 1 year ago

Large improvement with all these new variables:

'lnly ~ male + hsless + college + scale(l2age65l) + scale(l2age6574) + scale(l2age75p) + l2married + l2cohab + l2widowed + l2single + l2employed + l2inactive + l2physact + l2anyadl + l2anyiadl + l2srh5 + l2psyche + l2wealth_quintile + l2hhres + l2socyr + l2gcareinhh1w + childless'

Call:
polr(formula = form.lnly.test, data = trans, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                     Value Std. Error t value
male1             -0.08611    0.03133 -2.7484
hsless1            0.09836    0.03942  2.4952
college1          -0.01999    0.03984 -0.5018
scale(l2age65l)   -0.15162    0.02672 -5.6750
scale(l2age6574)  -0.03915    0.02516 -1.5564
scale(l2age75p)    0.07683    0.02701  2.8446
l2married         -0.83028    1.20333 -0.6900
l2cohab           -0.70330    1.20587 -0.5832
l2widowed          0.23043    1.20393  0.1914
l2single           0.14344    1.20373  0.1192
l2employed1        0.07640    0.04553  1.6781
l2inactive1        0.34027    0.05915  5.7525
l2physact1         0.07717    0.04484  1.7211
l2anyadl           0.37890    0.05251  7.2160
l2anyiadl          0.40776    0.05320  7.6653
l2srh5             0.37322    0.09338  3.9967
l2psyche           0.66532    0.04781 13.9157
l2wealth_quintile -0.06437    0.01244 -5.1753
l2hhres           -0.04139    0.02305 -1.7961
l2socyr           -0.16051    0.03206 -5.0066
l2gcareinhh1w      0.46441    0.06244  7.4375
childless          0.01397    0.04350  0.3211

Intercepts:
    Value   Std. Error t value
1|2  0.3231  1.2061     0.2679
2|3  3.0698  1.2066     2.5442

Residual Deviance: 30768.29 
AIC: 30816.29 
(63058 observations deleted due to missingness)

Looking at the AIC (assessment of model fit), it is a large improvement on the previous full model (30816 vs 55704, smaller is better). The only variables I would question are employed and physact, as I would have thought these would be associated with lower loneliness. The coefficients are small, and we can't see significance here so that might be an explanation. We'll see when I plumb these new variables into the stata models.

NOTE: We remove 63,058 observations out of only 87189 due to missingness. This means we're dropping over 2/3rds of the sample. I'm not that upset because the model is behaving as we would expect. ALSO NOTE: nursing home living (institutional) is the only variable from the shortlist I did not include. There is some nuance here with the sample weights that make this a more complicated thing to target, so I will leave it for now.

ld-archer commented 1 year ago

Stata model:

oprobit
lnly
male                     -0.06266
white                    -0.48712
hsless                    0.06098
college                  -0.02043
l2age65l                 -0.01923
l2age6574                -0.00461
l2age75p                  0.01149
l2single                  0.55615
l2cohab                   0.07403
l2widowed                 0.61232
l2employed                0.04320
l2inactive                0.17956
l2physact                 0.04942
l2anyadl                  0.22227
l2anyiadl                 0.23071
l2srh5                    0.22817
l2psyche                  0.40017
l2wealth_quintile        -0.03957
l2hhres                  -0.02820
l2socyr                  -0.09939
l2gcareinhh1w             0.27532
childless                 0.00672
cut1                     -0.95873
cut2                      0.48867
ld-archer commented 1 year ago

Another version with logatotb instead of wealth quintile because I don't know how to calculate this in the simulation (have tried both vars.txt but there is no way there to do calculations across the entire population, and the syntax is custom and was created by the US team. Also I struggled in the cpp code, think continuous logatotb is fine).

Call:
polr(formula = form.lnly.test, data = trans, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                     Value Std. Error t value
male1             -0.08784    0.03142 -2.7955
hsless1            0.08700    0.04000  2.1753
college1          -0.04540    0.03902 -1.1635
scale(l2age65l)   -0.11334    0.02312 -4.9015
scale(l2age6574)  -0.04237    0.02501 -1.6940
scale(l2age75p)    0.07805    0.02718  2.8721
l2cohab            0.06804    0.08023  0.8481
l2widowed          1.07978    0.05363 20.1344
l2single           1.01198    0.04838 20.9166
l2employed1        0.10214    0.04533  2.2533
l2inactive1        0.31293    0.06025  5.1941
l2physact1         0.07858    0.04417  1.7791
l2anyadl           0.39813    0.05343  7.4508
l2anyiadl          0.43338    0.05390  8.0405
l2srh5             0.30817    0.09885  3.1174
l2psyche           0.68999    0.04827 14.2958
l2hhres           -0.01792    0.02306 -0.7772
l2socyr           -0.17234    0.03189 -5.4043
l2gcareinhh1w      0.49701    0.06290  7.9016
childless          0.01333    0.04315  0.3089
scale(l2logatotb) -0.04804    0.01951 -2.4626

Intercepts:
    Value   Std. Error t value
1|2  1.4350  0.0841    17.0559
2|3  4.2194  0.0931    45.3157

Residual Deviance: 30654.42 
AIC: 30700.42 
(62868 observations deleted due to missingness)

AIC is slightly better as well.

ld-archer commented 1 year ago

Another small update, found a variable that measures depression instead of all psychiatric problems, which seems to have a stronger relationship with loneliness and improve the AIC of the model. The variable is the Center for Epidemiological Studies - Depression (CESD) scale.

Call:
polr(formula = form.lnly.test, data = trans, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                      Value Std. Error t value
male1              0.024943   0.032389  0.7701
hsless1            0.005230   0.041110  0.1272
college1          -0.028389   0.040023 -0.7093
scale(l2age65l)   -0.079922   0.023991 -3.3313
scale(l2age6574)  -0.035008   0.025655 -1.3646
scale(l2age75p)    0.076779   0.027756  2.7662
l2cohab            0.081692   0.082882  0.9856
l2widowed          0.952489   0.055100 17.2865
l2single           0.964048   0.049733 19.3846
l2employed1        0.076830   0.046598  1.6488
l2inactive1        0.255990   0.062069  4.1243
l2physact1         0.058054   0.045418  1.2782
l2anyadl           0.238782   0.055245  4.3222
l2anyiadl          0.259813   0.055803  4.6559
l2srh5            -0.148629   0.103428 -1.4370
l2hhres           -0.021835   0.023791 -0.9178
l2socyr           -0.126602   0.032706 -3.8710
l2gcareinhh1w      0.402803   0.064941  6.2025
childless         -0.006838   0.044183 -0.1548
scale(l2logatotb) -0.010991   0.020132 -0.5459
l2cesd             0.384089   0.009495 40.4538

Intercepts:
    Value   Std. Error t value
1|2  1.7403  0.0870    19.9973
2|3  4.7171  0.0976    48.3374

Residual Deviance: 29131.81 
AIC: 29177.81 
(62898 observations deleted due to missingness)