unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.05k stars 878 forks source link

[BUG] Training data did not change at all and now forecasting models are giving error #1451

Closed pgonzalezb4 closed 1 year ago

pgonzalezb4 commented 1 year ago

Describe the bug Hi, I am using a RegressionModel() instance with a sklearn.GradientBoostingRegressor instance as underlying model for time series forecasting, the data I used to train with this model is the same that I am trying to train now, but I am getting this error:

File "/alloc/data/model.py", line 43, in fit_model
    model.fit(series = train)
  File "/root/.cache/pypoetry/virtualenvs/app-daZABYny-py3.9/lib/python3.9/site-packages/darts/models/forecasting/regression_model.py", line 476, in fit
    self._fit_model(
  File "/root/.cache/pypoetry/virtualenvs/app-daZABYny-py3.9/lib/python3.9/site-packages/darts/models/forecasting/regression_model.py", line 358, in _fit_model
    training_samples, training_labels = self._create_lagged_data(
  File "/root/.cache/pypoetry/virtualenvs/app-daZABYny-py3.9/lib/python3.9/site-packages/darts/models/forecasting/regression_model.py", line 339, in _create_lagged_data
    training_samples = _add_static_covariates(
  File "/root/.cache/pypoetry/virtualenvs/app-daZABYny-py3.9/lib/python3.9/site-packages/darts/utils/data/tabularization.py", line 221, in _add_static_covariates
    return np.concatenate([features, static_covs], axis=1)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 9574 and the array at index 1 has size 9568

As a side error, when I use xgboost model on the code snippet that you find below, it gives me the error ValueError: Unable to build any training samples of the target series at index 0 and the corresponding covariate series; There is no time step for which all required lags are available and are not NaN values. but I am using exactly the same time series with the same amount of lags as with gradient boosting which does not give me this error.

I highly suspect this is an error with the numpy version or on the creation of lagged data but I can't figure it out.

To Reproduce

def fit_model(model : str, series : list, freq : str, kwargs : dict = {}) -> dict:
    if freq == '1min':
        horizon = 11520
        lag_space = 1440
    elif freq == '15min':
        horizon = 768
        lag_space = 96
    elif freq == '30min':
        horizon = 384
        lag_space = 48
    elif freq == '1h':
        horizon = 192
        lag_space = 24

    train, test = train_test_split(data = series, test_size=horizon, axis=1)

    if model == 'gradientboosting':
        model = RegressionModel(lags=list(range(-horizon, -1, lag_space)), model = GradientBoostingRegressor(loss='absolute_error', **kwargs))
    elif model == 'xgboost':
        model = XGBModel(lags=list(range(-horizon, -1, lag_space)), output_chunk_length=horizon, **kwargs)

    print(f'Using model {model} with freq {freq}, parameters {(horizon, lag_space)} and hyperparameters\n{model.model.get_params()}')
    model.fit(series = train)
    print(f'Model {model} with freq {freq} has been trained, generating predictions and calculating mae...')
    pred = model.predict(n = horizon, series = train)
    mean_absolute_error = mae(test, pred)
    mean_mae = sum(mean_absolute_error) / len(mean_absolute_error)
    return {'freq' : freq, 'min mae' : min(mean_absolute_error), 'mean mae' : mean_mae, 
            'max mae' : max(mean_absolute_error), 'pred' : pred, 'test' : test}

Expected behavior I simply expected the model to train on the data I have always used to train it with.

System (please complete the following information):

Additional context This didn't happen when I was using darts 0.21.0, this began to happen after updating to 0.23.0

dennisbader commented 1 year ago

Hi @pgonzalezb4, there is a difference in how you create the models. For XGBModel, you set the output_chunk_length parameter. Can you try using the same for both models? Let us know if the issue still persists.

pgonzalezb4 commented 1 year ago

Hi @dennisbader! thank you, the problem disappeared after using output_chunk_length parameter (why is that?), but now the problem that I have is the following:

ValueError: Unable to build any training samples of the target series at index 0 and the corresponding covariate series; There is no time step for which all required lags are available and are not NaN values.

This didn't appear before and I am using the same training data with the same amount of lags and I am making sure that all of my time series have at least 16 days of data (since I need 8 days for test dataset and 8 days of lags).

madtoinou commented 1 year ago

Hi @pgonzalezb4,

It was probably because the default value of output_chunk_length was not well defined by darts, causing the error in the sanity checks verifying that the series contains enough values for a fit() call.

Darts model are unaware of the frequency of your series, meaning that if the frequency of the series is 1D, and you use freq="1h", the model will expect at least 24 values in the series (lags) as input and additional 192 values to be able to compare its first prediction of length output_chunk_length. You will need to either resample the target series or find a way to import it with the proper frequency.

I tried reproducing your problem with darts 0.25.0 and it worked just fine for a series of length 600 (step=1, IntegerIndexed) and freq=1h.

I am going to close this for now, @pgonzalezb4 feel free to reopen if the problem persists.