nhs-bnssg-analytics / d_and_c

Scoping the possibility of predicting performance from demand and capacity metrics
1 stars 0 forks source link

Reduce importance on previous year's value #36

Closed sebsfox closed 2 weeks ago

sebsfox commented 2 weeks ago

look at penalty.factor in glmnet() to penalise the "previous year's target variable" as a predictor more than the other predictor variables

sebsfox commented 2 weeks ago

I have investigated this for the cancer metric. With a uniform penalty factor across all metrics, the MAE on the test set is 0.02144 for logistic regression, 3 training years, 2 lagged years and 1 lagged target value year. in this circumstance, the lagged target value has about 0.048 importance relative to the next variable, which is 0.003 (overnight G&A beds).

If all of the variables except for the lagged target variable are constrained to a penalty of 0.2 relative to the target variable which has a penalty of 1, the resulting mae on the test set becomes 0.02285, the importance of the lagged target variable is 0.0329 and the next most important variable is 0.00337 (overnight G&A beds).

Repeating the above, but reducing the penalty weighting to 0.1 for all of the variables except the lagged target variable, the resulting MAE is 0.04, the lagged target variable becomes very unimportant (0.000071) and the most important variable now is overnight G&A beds (0.00255).

This shows that the penalty.factor argument is doing what it is supposed to, but the modelling performance deteriorates when it is used (lower than the next best model, eg, difference from previous year). It also shows that constraining the penalty on the remaining predictor variables does not have a big effect on the importance of the remaining predictor variables - eg, the relationship between them and this particular outcome is not too important.