nhs-bnssg-analytics / d_and_c

Scoping the possibility of predicting performance from demand and capacity metrics
1 stars 0 forks source link

Random forest modelling understanding #22

Open sebsfox opened 5 months ago

sebsfox commented 5 months ago
sebsfox commented 5 months ago

This comment is related to item 2 in the checklist above

The above image is the distribution of the target variable over time. It increases year on year. That could mean that RF is inappropriate for predicting future performance if it keeps on increasing?

When predicting 2019 data, this chart shows observed versus expected when using 2017 and 2018 input data: image

Using the same data, but randomly splitting it (eg, ignoring the year of the data), the same chart is as follows: image

For the second scenario, the RF model is underpredicting at high values and overpredicting at low values.