Alternative model/variable selection approaches

We are in what is a somewhat tricky regime for LOOIC-based model selection: not enormous numbers of subjects (though by no means small either), and a binary outcome variable. This means the LOOIC can have a high variance:

We can see a related issue in the initial model outputs, where there are many small differences and we could essentially pick any model as long as it includes the effect of sex:


Model comparison
                                                                        elpd_diff se_diff
NPI_apathy_present ~ 1 + sex + ethnicity + age_at_diagnosis               0.0       0.0  
NPI_apathy_present ~ 1 + sex + ethnicity + education + age_at_diagnosis  -0.3       1.3  
NPI_apathy_present ~ 1 + sex + ethnicity                                 -0.6       1.8  
NPI_apathy_present ~ 1 + sex + age_at_diagnosis                          -0.7       0.5  
NPI_apathy_present ~ 1 + sex + ethnicity + education                     -0.9       2.2  
NPI_apathy_present ~ 1 + sex + education + age_at_diagnosis              -1.1       1.3  
NPI_apathy_present ~ 1 + sex                                             -1.5       1.9  
NPI_apathy_present ~ 1 + sex + education                                 -1.7       2.3  
NPI_apathy_present ~ 1 + ethnicity + age_at_diagnosis                    -9.2       4.3  
NPI_apathy_present ~ 1 + age_at_diagnosis                                -9.6       4.3  
NPI_apathy_present ~ 1 + ethnicity                                       -9.9       4.7  
NPI_apathy_present ~ 1 + ethnicity + education + age_at_diagnosis        -9.9       4.4  
NPI_apathy_present ~ 1 + education + age_at_diagnosis                   -10.2       4.4  
NPI_apathy_present ~ 1 + ethnicity + education                          -10.4       4.8  
NPI_apathy_present ~ 1                                                  -10.5       4.7  
NPI_apathy_present ~ 1 + education                                      -11.1       4.9  

Winning formula: ‘NPI_apathy_present ~ 1 + sex + ethnicity + age_at_diagnosis’

What Piironen & Vehtari recommend is:

If predictive performance is the aim, then BMA is the best bet rather than making hard include/exclude decisions.
Taking the full posterior, and reducing that down to a smaller number of variables post hoc via the projection method often has lower variance than model comparison. Perhaps one to try?

See the following for example code:

nzbri / pd-apathy

Alternative model/variable selection approaches #20