ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

improve predictors of social isolation #117

Closed ld-archer closed 1 year ago

ld-archer commented 1 year ago

A good place to start would be the predictors of loneliness from #116 .

However some of those variable can not be used as they will be too strongly correlated with the outcome, specifically thinking of socyr which is whether respondent takes part in social activities.

ld-archer commented 1 year ago

Original Models:

[1] "Minimal model:"
Call:
polr(formula = form.sociso.min, data = trans.sociso, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                    Value Std. Error t value
male1            -0.01243   0.013820 -0.8994
scale(l2age65l)   0.02401   0.008589  2.7949
scale(l2age6574)  0.02716   0.010226  2.6564
scale(l2age75p)   0.11449   0.008731 13.1135

Intercepts:
    Value     Std. Error t value  
1|2   -1.1637    0.0106  -109.6257
2|3    0.4543    0.0097    47.0175
3|4    1.8360    0.0124   147.8766
4|5    3.4150    0.0222   153.5984
5|6    5.7912    0.0684    84.6463

Residual Deviance: 199639.28 
AIC: 199657.28 
(13692 observations deleted due to missingness)

[1] "Full model:"
Call:
polr(formula = form.sociso, data = trans.sociso, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                     Value Std. Error  t value
male1             0.059588    0.01701   3.5023
white             0.583900    0.05070  11.5162
hsless1           0.113583    0.02076   5.4705
college1         -0.109118    0.02227  -4.8999
scale(l2age65l)   0.005164    0.01034   0.4995
scale(l2age6574) -0.035266    0.01409  -2.5034
scale(l2age75p)   0.190671    0.01473  12.9485
scale(atotb)     -0.124004    0.01043 -11.8889
l2employed1      -0.047135    0.02499  -1.8862
l2inactive1      -0.060840    0.03277  -1.8566
l2physact1        0.004839    0.02354   0.2056

Intercepts:
    Value    Std. Error t value 
1|2  -0.6125   0.0579   -10.5740
2|3   1.0344   0.0580    17.8213
3|4   2.4908   0.0591    42.1649
4|5   4.1335   0.0640    64.6348
5|6   6.7321   0.1150    58.5154

Residual Deviance: 134113.41 
AIC: 134145.41 
(36059 observations deleted due to missingness)

The AIC of both models suggest they are pretty poor. Thinking about the loneliness model, the minimal is roughly AIC==90,000 and final improved model around 30,000. The starting point for social isolation is therefore quite a lot worse.

[1] "Test model:"
Call:
polr(formula = form.sociso.test, data = trans, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                      Value Std. Error  t value
male1              0.342027    0.02141  15.9777
hsless1            0.016901    0.02687   0.6290
college1          -0.219189    0.02641  -8.3005
scale(l2age65l)   -0.008506    0.01451  -0.5861
scale(l2age6574)  -0.035257    0.01712  -2.0599
scale(l2age75p)    0.050592    0.01913   2.6451
l2cohab            0.248150    0.04986   4.9767
l2widowed          1.638206    0.03955  41.4230
l2single           1.820052    0.03619  50.2926
l2employed1        0.068507    0.03065   2.2354
l2inactive1        0.061962    0.04252   1.4571
l2physact1         0.102853    0.02925   3.5159
l2anyadl          -0.015019    0.03888  -0.3863
l2anyiadl          0.034461    0.03954   0.8716
l2srh5             0.140513    0.07288   1.9281
l2psyche           0.076192    0.03604   2.1142
l2hhres           -0.131753    0.01523  -8.6534
l2gcareinhh1w      0.221444    0.04533   4.8856
childless         -1.367005    0.03059 -44.6820
scale(l2logatotb) -0.116626    0.01324  -8.8095

Intercepts:
    Value    Std. Error t value 
1|2  -2.1650   0.0566   -38.2580
2|3  -0.1502   0.0556    -2.7007
3|4   1.6512   0.0567    29.1361
4|5   3.5319   0.0645    54.7422
5|6   6.2361   0.1391    44.8437

Residual Deviance: 83509.54 
AIC: 83559.54 
(53417 observations deleted due to missingness)

Vast improvement but we can probably still do better with help from literature. Also not sure if I should have the marital status variables in here as married or cohabiting is one of the elements of the social isolation index.

ld-archer commented 1 year ago

Another great review paper to work with - Ejiri et al. (2021).

Variables I can include that I haven't already:

NOTE: Both self-rated eyesight and hearing is assessed in 5 level likert scale, with higher values for worse outcomes (5 = poor).

[1] "Test model:"
Call:
polr(formula = form.sociso.test, data = trans, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                       Value Std. Error   t value
male1              0.3494362   0.021866  15.98091
hsless1            0.0032698   0.026960   0.12128
college1          -0.1965958   0.026630  -7.38259
scale(l2age65l)   -0.0002969   0.014634  -0.02029
scale(l2age6574)  -0.0337172   0.017173  -1.96342
scale(l2age75p)    0.0518324   0.019224   2.69626
l2cohab            0.2545730   0.050039   5.08746
l2widowed          1.6404089   0.039956  41.05543
l2single           1.8278257   0.036405  50.20851
l2employed1        0.0658869   0.030708   2.14557
l2inactive1        0.0636147   0.042672   1.49080
l2physact1         0.0914745   0.029331   3.11871
l2anyadl          -0.0380546   0.039139  -0.97229
l2anyiadl          0.0024966   0.039864   0.06263
l2srh5             0.0802218   0.073742   1.08787
l2hhres           -0.1339939   0.015285  -8.76612
l2gcareinhh1w      0.2010271   0.045473   4.42080
childless         -1.3839590   0.030697 -45.08501
scale(l2logatotb) -0.2061006   0.018734 -11.00160
l2cesd             0.0348906   0.006751   5.16857
l2sight            0.0326863   0.012002   2.72350
l2hearing          0.0096480   0.010334   0.93359
l2ahown            0.3743588   0.051814   7.22502

Intercepts:
    Value    Std. Error t value 
1|2  -1.7446   0.0776   -22.4773
2|3   0.2758   0.0773     3.5697
3|4   2.0819   0.0782    26.6133
4|5   3.9662   0.0843    47.0717
5|6   6.6666   0.1493    44.6603

Residual Deviance: 83093.42 
AIC: 83149.42 
(53538 observations deleted due to missingness)

Only a very small improvement in AIC, but each variable is an improvement nonetheless.

ld-archer commented 1 year ago

Decided to remove the ADL variables, as well as the work status vars. I think the important ADL characteristics will be captured by sight and hearing, and removing them here allows me to use social isolation as a predictor of disability with no statistical baggage. Also removing employment status as its not mentioned in the literature, and the t values were relatively small. Removing these groups of variables has also slightly improved the AIC, which is a function of the variance captured by the predictors as well as the number of predictors included. If two sets of predictors capture the same variance, but one set is larger than the other, the smallest set will have the best AIC. I think this means we don't lose anything important by removing these variables.

[1] "Test model:"
Call:
polr(formula = form.sociso.test, data = trans, na.action = na.omit, 
    Hess = TRUE)

Coefficients:
                      Value Std. Error  t value
male1              0.350109   0.021610  16.2012
hsless1            0.006486   0.026909   0.2410
college1          -0.200121   0.026569  -7.5322
scale(l2age65l)   -0.010262   0.013950  -0.7356
scale(l2age6574)  -0.047472   0.016121  -2.9447
scale(l2age75p)    0.052477   0.019153   2.7398
l2cohab            0.255329   0.050031   5.1034
l2widowed          1.639766   0.039930  41.0656
l2single           1.830148   0.036371  50.3189
l2physact1         0.071256   0.027129   2.6266
l2srh5             0.069969   0.072283   0.9680
l2hhres           -0.131765   0.015254  -8.6379
l2gcareinhh1w      0.202383   0.045284   4.4692
childless         -1.380075   0.030646 -45.0332
scale(l2logatotb) -0.206340   0.018713 -11.0268
l2cesd             0.034322   0.006655   5.1576
l2sight            0.032706   0.011982   2.7296
l2hearing          0.009218   0.010314   0.8937
l2ahown            0.373749   0.051790   7.2166

Intercepts:
    Value    Std. Error t value 
1|2  -1.7818   0.0756   -23.5728
2|3   0.2382   0.0752     3.1691
3|4   2.0443   0.0762    26.8368
4|5   3.9282   0.0823    47.7063
5|6   6.6286   0.1482    44.7270

Residual Deviance: 83099.77 
AIC: 83147.77 
(53538 observations deleted due to missingness)

STATA:
oprobit
sociso
male                      0.21968
white                     0.25673
hsless                    0.00679
college                  -0.10861
l2age65l                 -0.00090
l2age6574                -0.00591
l2age75p                  0.00756
l2single                  1.08313
l2cohab                   0.15157
l2widowed                 0.98442
l2physact                 0.04232
l2srh5                    0.03097
l2logatotb               -0.06477
l2hhres                  -0.06215
l2gcareinhh1w             0.11556
childless                -0.75421
l2cesd                    0.02187
l2sight                   0.02116
l2hearing                 0.00579
l2ahown                   0.21098
cut1                     -1.55090
cut2                     -0.33335
cut3                      0.68473
cut4                      1.65379
cut5                      2.82138

Now test run and look at outcomes again.

ld-archer commented 1 year ago

Outcomes with new social isolation model:

#### Life Years
[1] "No Loneliness"
[1] "Cohort Average Lifeyears:         41.59"
[1] "Intervention Average Lifeyears:   43.68"
[1] "Increase:                         2.085"
[1] "No Social Isolation"
[1] "Cohort Average Lifeyears:         41.59"
[1] "Intervention Average Lifeyears:   42.87"
[1] "Increase:                         1.279"
[1] "No Loneliness OR Social Isolation"
[1] "Cohort Average Lifeyears:         41.59"
[1] "Intervention Average Lifeyears:   44.83"
[1] "Increase:                         3.238"

#### Disability Free Life Years
[1] "No Loneliness"
[1] "Cohort Average Disability Free Lifeyears:         14.79"
[1] "Intervention Average Disability Free Lifeyears:   15.99"
[1] "Increase:                                         1.206"
[1] "No Social Isolation"
[1] "Cohort Average Disability Free Lifeyears:         14.79"
[1] "Intervention Average Disability Free Lifeyears:   15.86"
[1] "Increase:                                         1.071"
[1] "No Loneliness OR Social Isolation"
[1] "Cohort Average Disability Free Lifeyears:         14.79"
[1] "Intervention Average Disability Free Lifeyears:   17.06"
[1] "Increase:                                         2.275"

#### Disease Free Life Years
[1] "No Loneliness"
[1] "Cohort Average Disease Free Lifeyears:         5.151"
[1] "Intervention Average Disease Free Lifeyears:   5.392"
[1] "Increase:                                         0.2411"
[1] "No Social Isolation"
[1] "Cohort Average Disease Free Lifeyears:         5.151"
[1] "Intervention Average Disease Free Lifeyears:   5.312"
[1] "Increase:                                         0.1613"
[1] "No Loneliness OR Social Isolation"
[1] "Cohort Average Disease Free Lifeyears:         5.151"
[1] "Intervention Average Disease Free Lifeyears:   5.552"
[1] "Increase:                                         0.4019"

I'm happy with this model now, some variables to add to another issue to make fully dynamic (cesd (depression), sight, hearing, ahown (home ownership)) but this won't change the model.