Closed ggjx22 closed 8 months ago
Some comments ahead:
splitter.get_fh()
for the fh
is unusual, and it seems risky (this is not a general API point) - though I see no concrete prolem eithertransfo_pipe * gscv_multiplex
- that would have the same effect and allows you to treat the preprocessing as part of a forecasternaive
and stl
even if they are not selected. You can specify union grids like this:multiplex_params_grid = [
[
{'selected_forecaster': ['naive']},
{'naive__sp': [4, 12]},
],
[
{'selected_forecaster': ['stl']},
{'stl__seasonal': [7, 13]},
]
]
(this has 4 elements, while your grid has 8, out of which 4 are redundant)
But most predictions made in step 2 and 3 are the same for periods which they coincide with each other. Am I doing the 'update' correctly?
I think so - I am guessing your grid search selects the NaiveForecaster
with sp =12
. You would expect the predictions to coincide, as they are simply replaying th value 12 months prior.
- In step 2,
.update_predict_single
is the same as doing.update()
follow by.predict()
?
yes.
- In step 3, Is it because there are too few new data to influence
loaded_model
? Which is whynew_pred
andpred
are almost identical.
possibly, seems plausible.
The reason that some values are exactly identical is likely that your grid search selects the NaiveForecaster
, as explained (perhaps you can check?)
Hello @fkiraly thanks for your reply and appreciate it.
- the grid is unnecessarily large - you are changing parameters of
naive
andstl
even if they are not selected. You can specify union grids like this:multiplex_params_grid = [ [ {'selected_forecaster': ['naive']}, {'naive__sp': [4, 12]}, ], [ {'selected_forecaster': ['stl']}, {'stl__seasonal': [7, 13]}, ] ]
(this has 4 elements, while your grid has 8, out of which 4 are redundant)
Thanks for pointing this out. I had to tweak it a little so it is fine now. I also changed the models and now pred
and new_pred
are indeed different.
from sktime.forecasting.compose import make_reduction
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
# build multiplex forecaster
multiplex_frctr = MultiplexForecaster(
forecasters=[
('gbr', make_reduction(GradientBoostingRegressor(random_state=42))),
('knnr', make_reduction(KNeighborsRegressor(n_jobs=-1))),
]
)
# models hyperparameter grids for model selection forecaster
multiplex_params_grid = [
{
'selected_forecaster': ['gbr'],
'gbr__estimator__n_estimators': np.arange(50, 200, 50)
},
{
'selected_forecaster': ['knnr'],
'knnr__estimator__n_neighbors': np.arange(5, 10, 1)
}
]
possibly, seems plausible. The reason that some values are exactly identical is likely that your grid search selects the NaiveForecaster, as explained > > (perhaps you can check?)
After changing the models the values aren't replaying themselves. pred
and new_pred
produce different results.
# compare the 2 sets of predictions made
plot_series(df[target].iloc[:-1], pred, new_y, new_pred, labels=['past', 'pred', 'updated', 'new_pred'])
so, all is fine? Or do you still think there is an issue?
PS: typical ML tabular regressors (especially tree based ensembles) will not be able to extrapolate, that's why you see the forecasts do not go "above" the values observed in the past. If you want that, you ought to pipeline with sth like a Detrender
.
so, all is fine? Or do you still think there is an issue?
All should be fine now.
PS: typical ML tabular regressors (especially tree based ensembles) will not be able to extrapolate, that's why you see the forecasts do not go "above" the values observed in the past. If you want that, you ought to pipeline with sth like a
Detrender
.
I have already implemented that in the pipeline for my data together with TransformIf
& Differencer
within TransformedTargetForecaster
. For this purpose, I'm just using a toy dataset and a simplified pipeline.
Describe the bug Hello all, first of all, I want to apologies for posting this long issue. This is more of a Q&A rather than a bug. I wish to seek clarifications about how one can correctly update a trained and saved model with new data, and produce new predictions. In this issue, I will like to share how I set up an autoML pipeline (simplified) + incremental learning steps. I have kept it as short as possible but also do not want to miss out any code in case they are needed for context. Basically I want to verify if I'm approaching this type of problem correctly.
Questions to clarify:
.update_predict_single
is the same as doing.update()
follow by.predict()
?loaded_model
? Which is whynew_pred
andpred
are almost identical.Please let me know if you need further details or context.
To Reproduce Step 1: Set up End-to-End autoML workflow
Step 2: Assuming I'm satisfied with the hyperparameters and backtest results. I update
best_forecaster
withunseen
, produce predictions (pred
) and storepred
in my database.Step 3: I save
best_forecaster
and load it back when a new month arrive.Simulate new data for new month
With new data, I want to update
loaded_model
with it and produce a new set of predictions.So now I thought, ok this is working, no errors so far. But most predictions made in step 2 and 3 are the same for periods which they coincide with each other. Am I doing the 'update' correctly?
From 1961-02 to 1961-12, both
pred
andnew_pred
has the same prediction values.pred
from step 2.new_pred
from step 3.Expected behavior
new_pred
to have slightly different prediction values since it has been 'updated' with new data?Additional context NA
Versions