unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.04k stars 874 forks source link

[BUG] val_series parameter in fit method is not working on TFTModel #1849

Closed DevSoftChuck closed 1 year ago

DevSoftChuck commented 1 year ago

Describe the bug Hello everyone! I hope you're doing great. The problem is that when I pass the validation data to my model to train/fit it I get this error: ValueError: The provided validation time series dataset is too short for obtaining even one training point.. However, my validation dataset has 6 records, so it is not empty or null. The following image shows my val_dataset called asu_val_spi_scaled:

Screenshot from 2023-06-21 13-16-07 Screenshot from 2023-06-21 13-25-02

To Reproduce

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from darts import TimeSeries
from darts import concatenate
from darts.metrics import mse, r2_score, rmse, mae, mape
from darts.dataprocessing.transformers import Scaler, StaticCovariatesTransformer
from sklearn.preprocessing import MinMaxScaler
import torch
import optuna
from darts.models import TFTModel
from darts.utils.likelihood_models import QuantileRegression
import warnings
warnings.filterwarnings("ignore")
import logging
logging.disable(logging.CRITICAL)

data_set = pd.read_csv("monthly_spi_data.csv", delimiter=",", parse_dates=True)
training_cutoff = pd.Timestamp("20221101")

scaler = MinMaxScaler(feature_range=(-1, 1))
spi_transformer = Scaler(scaler)

# Create a TimeSeries, specifying the time and value columns
value_cols=['tempmax', 'tempmin', 'temp', 'feelslikemax',
       'feelslikemin', 'feelslike', 'dew', 'humidity', 'precip', 'precipprob',
       'precipcover', 'windspeed', 'winddir', 'sealevelpressure', 'cloudcover',
       'visibility', 'spi','mes']
# static_covariates= ['name', 'categorical_id', 'mes', 'estacion_del_anio', 'time_idx']
multivariate_asu_series = TimeSeries.from_dataframe(
    data_set[data_set.categorical_id =='ASU'], 
    time_col="datetime", 
    value_cols=value_cols,
    freq='M'
)
asu_train, asu_val = multivariate_asu_series.split_before(training_cutoff)

asu_train_spi_scaled = spi_transformer.fit_transform(asu_train['spi'])
asu_val_spi_scaled = spi_transformer.transform(asu_val['spi'])

past_covariates_transformer = Scaler(scaler)
future_covariates_transformer = Scaler(scaler)

asu_train_past_covariates_scaled = past_covariates_transformer.fit_transform(asu_train.drop_columns(['spi', 'mes']))
asu_val_past_covariates_scaled = past_covariates_transformer.transform(asu_val.drop_columns(['spi', 'mes']))
asu_future_covariates_scaled = future_covariates_transformer.fit_transform(asu_train[['mes']])
from pytorch_lightning.callbacks.early_stopping import EarlyStopping

# stop training when validation loss does not decrease more than 0.05 (`min_delta`) over
# a period of 10 epochs (`patience`)
early_stop_callback = EarlyStopping(
    monitor="val_loss",
    patience=10,
    min_delta=0.05,
    mode='min',
)

pl_trainer_kwargs={"callbacks": [early_stop_callback]}

tft_model = TFTModel(input_chunk_length=60, 
                     output_chunk_length=6, 
                     hidden_size=25, 
                     lstm_layers=8,
                     num_attention_heads=5, 
                     full_attention=False,
                     dropout=0.1,
                     hidden_continuous_size=7, 
                     categorical_embedding_sizes=None, 
                     add_relative_index=True, 
                     batch_size=32,
                     loss_fn=torch.nn.MSELoss(), 
                     likelihood=QuantileRegression(quantiles=quantiles),
                     random_state=42, 
                     n_epochs=1000,
                     pl_trainer_kwargs=pl_trainer_kwargs,
                     save_checkpoints=True)

tft_model.fit(series=asu_train_spi_scaled,
              past_covariates=asu_train_past_covariates_scaled, 
              future_covariates=asu_future_covariates_scaled, 
              val_series=asu_val_spi_scaled, # HERE THE ERROR
              val_past_covariates=asu_val_past_covariates_scaled,
              val_future_covariates=asu_future_covariates_scaled,
              verbose=True)

Expected behavior Using early stopping feature, I should be able to train my model with the validation data provided.

System:

Additional context Add any other context about the problem here.

madtoinou commented 1 year ago

Hi @DevSoftChuck,

Thank you for sharing a code snippet.

From the parameters used to create your model, I can see that you set input_chunk_length=60, it means that the model needs 60 time stamps in order to produce output_chunk_length=6 values. Hence, the minimum length of the validation set must contain at least 66 values.

DevSoftChuck commented 1 year ago

HI @madtoinou, thank you so much for your response, that solved my issue.

tRosenflanz commented 6 months ago

Why is this a requirement? Is target automatically used as an input even though it might be absent from past_covariates?

madtoinou commented 6 months ago

Hi @tRosenflanz,

Can you please open a separate issue to ask your question?