Closed indinewton closed 2 years ago
following minimal solution works on main
WITHOUT TransformedTargetForecaster
:
from sktime.datasets import load_shampoo_sales
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import ExpandingWindowSplitter, SlidingWindowSplitter
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.theta import ThetaForecaster
from sktime.transformations.series.impute import Imputer
from sktime.forecasting.structural import UnobservedComponents
y = load_shampoo_sales()
fh = [1,2,3]
forecaster = UnobservedComponents()
cv = ExpandingWindowSplitter(
initial_window=24,
step_length=12,
start_with_window=True,
fh=[1,2,3])
gscv = ForecastingGridSearchCV(
forecaster=forecaster,
param_grid=
{
"transformed": [True, False],
},
cv=cv,
n_jobs=-1)
gscv.fit(y)
y_pred = gscv.predict(fh)
Following minimal solution works not on main
WITH TransformedTargetForecaster
:
from sktime.datasets import load_shampoo_sales
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import ExpandingWindowSplitter, SlidingWindowSplitter
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.theta import ThetaForecaster
from sktime.transformations.series.impute import Imputer
from sktime.forecasting.structural import UnobservedComponents
y = load_shampoo_sales()
fh = [1,2,3]
pipe = TransformedTargetForecaster(steps=[
("imputer", Imputer()),
("forecaster", UnobservedComponents())])
cv = ExpandingWindowSplitter(
initial_window=24,
step_length=12,
start_with_window=True,
fh=[1,2,3])
gscv = ForecastingGridSearchCV(
forecaster=pipe,
param_grid=
{
"forecaster__transformed": [True, False],
},
cv=cv,
n_jobs=-1)
gscv.fit(y)
y_pred = gscv.predict(fh)
thanks @aiwalter for enhancement of bug description
@aiwalter This works completely
y = load_shampoo_sales()
fh = [1,2,3]
pipe = TransformedTargetForecaster(steps=[
("imputer", Imputer()),
("forecaster", ThetaForecaster())])
pipe.fit(y)
pipe.predict(fh=fh)
Whereas this one only works until fit. So predict method is throwing error (which is pasted just below the code example)
y = load_shampoo_sales()
fh = [1,2,3]
pipe = TransformedTargetForecaster(steps=[
("imputer", Imputer()),
("forecaster", UnobservedComponents())])
pipe.fit(y)
# predict throws error as pasted below
pipe.predict(fh=fh)
error report:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
6
7 pipe.fit(y)
----> 8 pipe.predict(fh=fh)
~/work/sktime/sktime/forecasting/base/_base.py in predict(self, fh, X)
327 self._y_mtype_last_seen,
328 store=self._converter_store_y,
--> 329 store_behaviour="freeze",
330 )
331
~/work/sktime/sktime/datatypes/_convert.py in convert_to(obj, to_type, as_scitype, store, store_behaviour)
239
240 # now further narrow down as_scitype by inference from the obj
--> 241 from_type = infer_mtype(obj=obj, as_scitype=as_scitype)
242 as_scitype = mtype_to_scitype(from_type)
243
~/work/sktime/sktime/datatypes/_check.py in mtype(obj, as_scitype, exclude_mtypes)
311
312 if len(res) < 1:
--> 313 raise TypeError("No valid mtype could be identified")
314
315 return res[0]
TypeError: No valid mtype could be identified
same error with ForecastingPipeline
I improved the error message on mtype
in this PR (would appreciate a review and potential approval) https://github.com/alan-turing-institute/sktime/pull/2606
with the purpose to understand why the check is failing, inside the pipeline.
The result of that exercise gives this, more informative error:
Errors returned are as follows, in format [mtype]: [error message]
pd.DataFrame: <class 'pandas.core.indexes.base.Index'> is not supported for obj,
use one of (<class 'pandas.core.indexes.range.RangeIndex'>,
<class 'pandas.core.indexes.period.PeriodIndex'>,
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>)
or integer index instead.
(etcetera)
This means, the prediction that is being produced has an index which is non-conformant with the sktime
type specification for the mtype pd.DataFrame
- because it is a pd.Index
, and not one of the four allowed, more specialized index types.
so, two questions:
pd.Index
created?The place the error is raised is at the very end, i.e., just before we would return the output of the TransformedTargetForecaster
, i.e., all predicts and transforms internally have been executed and have raised no similar error.
it's an empty pd.DataFrame
when it reaches the convert_to
the second time
it's coming out of TransformedTargetForecaster._predict
Could it be that the UnobservedComponents predict output creates this index type when it returns the output?? Meaning modifying the _predict method of UnobservedComponents may be required additionally in Sktime side instead of simply calling it from statsmodel adaptors? I also noticed that UC also renames the predict output unlike other forecasters. It renames the output to "preds".
Actually the renaming of pred output by this UC models makes it also incompatible with ColumnwiseTransformer (CwT), which applies transformations and their inverse_transformations by the names of columns in Y dataframe. So Y_train with name "Number of Air Passengers" is passed (after predict from UC) as pd.Series(name="mean_preds") to inverse_transform method of CwT, which then throws error as "mean_preds" does not exist in columns ; error is raised from _check_columns() function.
@aiwalter FYI - a new bug but I will create a detailed issue later for that, unless it is clear to you from above example already.
Could it be that the UnobservedComponents predict output creates this index type when it returns the output?
The problem is an empty dataframe, not a wrong index type. Empty dataframes have the pd.Index
type, apparently, which is why the check says that.
Suspicion: the problem is probably caused by the name that UnobservedComponents
gives the series before it returns it.
This makes it break with a pd.DataFrame
input:
y = pd.DataFrame(load_shampoo_sales())
fh = [1,2,3]
f = UnobservedComponents()
f.fit(y)
f.predict(fh=fh)
found the problem!
Combination of two issues:
pd.DataFrame
constructor. When called as pd.DataFrame(my_series, columns=["col_name"])
, this will produce an empty data frame if my_series
has a name, otherwise a pd.DataFrame
with column name "col_name"
UnobserveComponents._predict
produces a non-conformant pd.Series
as a return, in that it has a name (predicted_mean
), as opposed to not having a nameI think we need to do two things:
pd.Series
to pd.DataFrame
even when the series has a namepredict
, and only for certain scenarios.PS @indinewton, did you know that instead of
pipe = TransformedTargetForecaster(
steps=[
("imputer", Imputer()),
("forecaster", UnobservedComponents())
]
)
you can also write
pipe = Imputer() * UnobservedComponents()
since 0.11.3?
We should probably update the forecasting tutorial about this...
Yeah I noticed the new mul Dunder. Although I still have to explore it more. I also noticed that many base protocols like or have been also overwritten for some BaseForecasters.. but I will soon check in next days.
say, @indinewton, now that the conversion issue is fixed in #2607, it seems the UnobservedComponents
forecaster produces 0.0
for forecasts on some of the above test cases. Is that expected? Seems pretty odd.
it seems to produce zero in all cases where it previously broke.
But, the problem is not with the conversion, I tested that, it's already coming zero out of the inner _predict
.
I would hence assume somewhere within UnobservedComponents
there is a potential issue of similar kind, but it's a different bug.
@juanitorduz, can you help, perhaps?
@fkiraly yeah, in example above it should produce 0 as forecasts, because no parameters were given to initiate UC. And I just run the same using 0.11.0 version of sktime and it generated this
1994-01 0.0
1994-02 0.0
1994-03 0.0
Freq: M, Name: predicted_mean, dtype: float64
Note how the Name of column has been changed to "predicted_mean".
Hey! I think I joined the party a bit late. This "issue" (un-expected behaviour) of the pd.DataFrame
constructor was a surprise for me as well 🙈 !
With respect to the zero values, this is because the comment above by @indinewton . Here is an example:
import matplotlib.pyplot as plt
from sktime.datasets import load_shampoo_sales
from sktime.forecasting.structural import UnobservedComponents
y = load_shampoo_sales()
fh = [1,2,3]
forecaster1 = UnobservedComponents()
forecaster1.fit(y)
y_pred1 = forecaster1.predict(fh)
forecaster2 = UnobservedComponents(
level="local linear trend",
freq_seasonal=[{"period": 12, "harmonics": 4}]
)
forecaster2.fit(y)
y_pred2 = forecaster2.predict(fh)
fig, ax = plt.subplots(figsize=(12, 6))
y.plot(label="training data", ax=ax)
y_pred1.plot(label="forecaster1", ax=ax)
y_pred2.plot(label="forecaster2", ax=ax)
ax.legend()
This is consistent with the statsmodels
implementation:
from statsmodels.tsa.statespace.structural import (
UnobservedComponents as _UnobservedComponents,
)
model = _UnobservedComponents(endog=y)
result = model.fit()
result.forecast(steps=3)
well, all good then if this is what it's supposed to do.
We are actually testing for consistency in test_structural
, which I missed (kudos, @juanitorduz!), so I think we can close if/once the fix gets merged.
This "issue" (un-expected behaviour) of the
pd.DataFrame
constructor was a surprise for me as well 🙈 !
It seems so very unexpected.
Would it be worth raising an issue on pandas
if that issue does not already exist?
It is the third time in my recollection, at least, that something broke due to that.
Because basically, it means you shouldn't be using the columns
arg at all, if you might be potentially expecting to get a named argument.
Describe the bug
New version of sktime throws error for UnobservedComponents when wrapped under ForecastingGridSearchCV with TransformedTargetForecaster pipe.
From sktime v.0.11.1 onwards untill the current version of main, UnobservedComponents throws error during Predict method call when it is piped under TransformedTargetForecaster(). The same construct works with all the other models which I could test for example ARIMA, AutoETS etc. So the issue is how TransformedTargetForecaster calls predict and the cascading of the same method to UnobservedComponents class.
To Reproduce The code example was taken from documentation of ForecastingGridSearchCV for advanced example. The only thing changed was calling UnobservedComponents instead of ExponentialSmoothing at the end of param_grid argument in gscv.
Expected behavior
Additional context Error report:
Versions System: python: 3.7.13 (default, Mar 28 2022, 07:24:34) [Clang 12.0.0 ] executable: .../miniconda3/envs/chronos_dev2/bin/python machine: Darwin-21.4.0-x86_64-i386-64bit
Python dependencies: pip: 21.2.2 setuptools: 58.0.4 sklearn: 1.0.2 sktime: 0.11.3 statsmodels: 0.12.1 numpy: 1.21.5 scipy: 1.7.3 pandas: 1.3.5 matplotlib: 3.5.1 joblib: 1.1.0 numba: 0.55.1 pmdarima: 1.8.5 tsfresh: None