Closed sebsfox closed 2 weeks ago
I have investigated this for the cancer metric. With a uniform penalty factor across all metrics, the MAE on the test set is 0.02144 for logistic regression, 3 training years, 2 lagged years and 1 lagged target value year. in this circumstance, the lagged target value has about 0.048 importance relative to the next variable, which is 0.003 (overnight G&A beds).
If all of the variables except for the lagged target variable are constrained to a penalty of 0.2 relative to the target variable which has a penalty of 1, the resulting mae on the test set becomes 0.02285, the importance of the lagged target variable is 0.0329 and the next most important variable is 0.00337 (overnight G&A beds).
Repeating the above, but reducing the penalty weighting to 0.1 for all of the variables except the lagged target variable, the resulting MAE is 0.04, the lagged target variable becomes very unimportant (0.000071) and the most important variable now is overnight G&A beds (0.00255).
This shows that the penalty.factor
argument is doing what it is supposed to, but the modelling performance deteriorates when it is used (lower than the next best model, eg, difference from previous year). It also shows that constraining the penalty on the remaining predictor variables does not have a big effect on the importance of the remaining predictor variables - eg, the relationship between them and this particular outcome is not too important.
look at
penalty.factor
inglmnet()
to penalise the "previous year's target variable" as a predictor more than the other predictor variables