Open fkiraly opened 8 months ago
Imo a bug, clearly - but not so clear where it is located.
To me, it looks like the statsforecast
y_pred
is identical with the first three years of y_train
.
What is odd is that the bug depends on which estimator is used, i.e., the condition seems to be a condition of evaluate
and StatsForecastAutoARIMA
, which makes a guess of what is going on hard.
a wild guess: it could be that sth produces incorrect indices internally - either the fh
, or the adapter estimator. Then, loc
or integer based indexing might retrieve - from the data - or request - from a prediction - the wrong values.
FYI @yarnabrina, any better or other guess?
I don't use evaluate
myself, so very not much familiar with its codebase,
but if direct use is working it's unlikely to be in adapter. I don't have
any guess to debug, but I'd try to see if it's statsforecast specific or if
it is happening with few other estimators as well, possibly with some
common tag or etc.
I use CV of course, so it's possibly affecting my work as well. I'll take a detailed look tomorrow to see if I can find any pattern or any other guesses.
On Mon, Feb 5, 2024 at 9:45 PM Franz Király @.***> wrote:
a wild guess: it could be that sth produces incorrect indices internally - either the fh, or the adapter estimator. Then, loc or integer based indexing might retrieve - from the data - or request - from a prediction - the wrong values.
FYI @yarnabrina https://github.com/yarnabrina, any better or other guess?
— Reply to this email directly, view it on GitHub https://github.com/sktime/sktime/issues/5894#issuecomment-1927357001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMCQBF47VPDEMPJEW5P7GLYSEATHAVCNFSM6AAAAABC2NIEZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXGM2TOMBQGE . You are receiving this because you were mentioned.Message ID: @.***>
@yarnabrina, evaluate
is used inside all the forecasting tuners - so if it were broken, then the same probem would impact all tuners.
I'd like to work on this issue! I've found the cause of the problem, will make a fix and PR tomorrow perhaps.
nice!
@Abhay-Lejith, may I kindly ask where we left off with #6029? Did you abandon working on this? Which is fine if it were the case, just want to understand the status.
@Abhay-Lejith, may I kindly ask where we left off with #6029? Did you abandon working on this? Which is fine if it were the case, just want to understand the status.
Yes, I am no longer working on that issue.
did we understand the "deeper reason" for this? Since it looks like you had a fix, but it broke other parts of sktime
- it would be good to understand the reasoning and record it here for the next person working on this. You will be credited of course, if your diagnosis leads to the fix.
Sorry, I did not investigate the matter further. The fix was causing other errors because I changed the splitting logic in the evaluate method, and this seemed to be causing problems elsewhere. Since you had told me the issue was a lot deeper and harder to fix, I did not look into it after that.
Can you explain perhaps why you changed things in evaluate
?
I modified how X_test
is generated by the evaluate
method so that it matches the X_test
created by temporal_train_test_split
. The difference in X_test
generated by the two methods was the cause of the different behaviour initially.
All the details are there in the PR as well.
Ahhhh, yes, thanks for explaining it again to me.
Summarizing for the next person looking at this:
the problem is that in evaluate
, the X_test
passed to the estimator is actually all X
seen, so it does not exactly match the y_test
, but also contains X_train
. This is what StatsForecastAutoArima
expects.
There are two approaches to fix this:
StatsForecastAutoArima.predict
, add subsetting of X
to the indices of fh
, in _predict
, before doing anything else.evaluate
, so X_test
indices always match y_test
@Abhay-Lejith took approach no.2.
Changing this to be exactly the same (as done manually by @Abhay-Lejith upon correct diagnosis in the PR #6029) is equivalent to removing the TestPlusTrainSplitter
, which fixes this bug, but causes other places in the code to break.
I think that's where our investigation got stuck, i.e., why those specific other places in the code break - and that would be the entry point for further work on approach no.2. That is:
TestPlusTrainSplitter
from the evaluate
fold generation logic@yarnabrina, for approach no.1, we would have to add subsetting in the statsforecast
adapter, _predict
.
It's a bit difficult to specifically filter with fh
, as statsforecast
expects horizon as the predict input, but I can do that.
However I am wondering whether it would make sense to do this subsetting in base class itself or not, in case other forecasters also may be doing such things but it was never captured? May be in predict
of BaseForecaster
? I'm not very confident though, as that may trigger another set of new failures.
Also, after I make the change in the adapter how do we test if it's fixed or not? Should we add a test for statsforecast models or for evaluate
?
@yarnabrina, my attempt to do this in the base class, conditional on a tag, is here: https://github.com/sktime/sktime/pull/6044
It is incomplete, but you are welcome to have a look at it.
Indeed I am also suspecting that this may be happening elsewhere, too, though it is hard to check, since it impacts primarily logic and is not captured by type checks.
Should we add a test for
statsforecast
models or forevaluate
Good question - both, perhaps? We can also magic-mock the estimator. The failure case that started this issue might be a good test case, too.
Bug report from discussion forum.
Discussed in https://github.com/sktime/sktime/discussions/5893