sktime / sktime

A unified framework for machine learning with time series
https://www.sktime.net
BSD 3-Clause "New" or "Revised" License
7.75k stars 1.32k forks source link

[BUG] Exogenous variables in ARIMA are not passed into BaggingForecaster #6340

Closed jiayuanteng closed 4 months ago

jiayuanteng commented 4 months ago

Describe the bug Hi, I want to flag that exogenous variables in ARIMA are not passed into BaggingForecaster. It looks like regardless whether I fit the function with/without exogenous variables, BaggingForecaster always fit ARIMA without exogenous variables.

To Reproduce

import pandas as pd
from sktime.forecasting.arima import ARIMA
from sktime.datasets import load_airline
from sktime.transformations.bootstrap import STLBootstrapTransformer
from sktime.forecasting.compose import BaggingForecaster

y = load_airline()
y = y.reset_index(drop = False)
y['month'] = y.Period.dt.month

# add monthly dummies
y = pd.concat([y.drop(['month'], axis = 1), pd.get_dummies(y['month'], dtype='int32')], axis = 1)
y.columns = ['period', 'passengers', 'fq_1', 'fq_2',  'fq_3',  'fq_4' , 'fq_5',  'fq_6',  'fq_7',  'fq_8',  'fq_9',  'fq_10',  'fq_11',  'fq_12']

y_train = y[:-24]
# y_test = y[-24:]

transformer = STLBootstrapTransformer(2, sp = 12, random_state = 1234)
arima = ARIMA(order = (1, 0, 0), seasonal_order=(0, 0, 0, 12))
forecaster =  BaggingForecaster(
                                transformer, arima)

# X variables are monthly dummies in AR(1)
baggingforecaster.fit(X = y_train[['fq_1', 'fq_2',  'fq_3',  'fq_4' , 'fq_5',  'fq_6',  'fq_7',  'fq_8',  'fq_9',  'fq_10',  'fq_11']], y = y_train['passengers'])

image001-2 Expected behavior I tested arima. Exogenous variables flow through as expected

arima.fit(X = y_train[['fq_1', 'fq_2',  'fq_3',  'fq_4' , 'fq_5',  'fq_6',  'fq_7',  'fq_8',  'fq_9',  'fq_10',  'fq_11']], y = y_train['passengers'])

Screenshot 2024-04-26 at 9 38 19 AM

Additional context

Versions sktime == 0.26.0

yarnabrina commented 4 months ago

This seems to be caused by this line, where exogenous variables are ignored.

https://github.com/sktime/sktime/blob/633a766f71629e0c38ad310c62909109a75c0aed/sktime/forecasting/compose/_bagging.py#L196

While it's easy to change, the estimator explicitly notes that it ignores X through tags.

https://github.com/sktime/sktime/blob/633a766f71629e0c38ad310c62909109a75c0aed/sktime/forecasting/compose/_bagging.py#L87

So I'll wait for @fkiraly or @ltsaprounis (original author) to chime in. Also it'd be a change in behaviour, so will it be considered breaking in any way in case someone is relying on X not being used?

fkiraly commented 4 months ago

This is one of many issues with the original bagging forecaster.

Strictly speaking not a bug, because - as @yarnabrina mentions - the ignores-exogeneous-X tag did correctly say that it ignores the exogeneous data, but of course that goes counter the user expectation, so one might still consider it a bug.

It should be fixed by the rework in this PR https://github.com/sktime/sktime/pull/6052, which is scheduled for the next release - 0.28.1, today or tomorrow.

Besides supporting exogeneous data, it will also add support for hierarchical, multivariate, and creating probabilistic forecasts via the bagging.

jiayuanteng commented 4 months ago

Thanks a lot @fkiraly

fkiraly commented 4 months ago

release is on the way, so tomorrow latest you should be able to try out the upgraded version.

jiayuanteng commented 4 months ago

V0.28.1 worked like a charm. Thanks a lot😄