Open sebsfox opened 5 months ago
This comment is related to item 2 in the checklist above
The above image is the distribution of the target variable over time. It increases year on year. That could mean that RF is inappropriate for predicting future performance if it keeps on increasing?
When predicting 2019 data, this chart shows observed versus expected when using 2017 and 2018 input data:
Using the same data, but randomly splitting it (eg, ignoring the year of the data), the same chart is as follows:
For the second scenario, the RF model is underpredicting at high values and overpredicting at low values.