unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.85k stars 849 forks source link

Gridsearch doesn't work with multiple timeseries #1622

Open pshetty44 opened 1 year ago

pshetty44 commented 1 year ago

Describe the bug While calling gridsearch for NeuralNets using multiple timeseries, we get an error: ValueError: The two TimeSeries sequences must have the same length. Note that in the example, the 2 timeseries are of the same length.

To Reproduce

import pandas as pd
from darts import TimeSeries
from darts.models import (
    NHiTSModel
)
from darts.metrics import rmse
import numpy as np

data = [['item1',  '01-01-2023', 10],
        ['item1',  '01-02-2023', 20],
        ['item1',  '01-03-2023', 30],
        ['item1',  '01-04-2023', 40],
        ['item1',  '01-05-2023', 50],
        ['item1',  '01-06-2023', 60],
        ['item1',  '01-07-2023', 70],
        ['item1',  '01-08-2023', 80],
        ['item1',  '01-09-2023', 90],
        ['item2',  '01-01-2023', 100],
        ['item2',  '01-02-2023', 200],
        ['item2',  '01-03-2023', 300],
        ['item2',  '01-04-2023', 400],
        ['item2',  '01-05-2023', 500],
        ['item2',  '01-06-2023', 600],
        ['item2',  '01-07-2023', 700],
        ['item2',  '01-08-2023', 800],
        ['item2',  '01-09-2023', 900]
       ]
df = pd.DataFrame(data, columns=['item' , 'sale_date', 'units'])
df['sale_date'] = pd.to_datetime(df['sale_date'])

item_list = TimeSeries.from_group_dataframe(df, group_cols = 'item', value_cols = 'units', time_col = 'sale_date')

params = {
    "input_chunk_length" : [6],
    "output_chunk_length" : [1],
    "num_layers": [1,2,3]

}

res = NHiTSModel.gridsearch(parameters=params,
                            series=item_list,
                            metric=rmse,
                            reduction=np.mean,
                            n_jobs=-1,
                            n_random_samples=0.99,
                            verbose=True,
                            forecast_horizon = 1
                       )

ValueError: The two TimeSeries sequences must have the same length.

Expected behavior The code should do a gridsearch successfully and be able to give us the best model.

System (please complete the following information):

Additional context Gridsearch works with a single timeseries data.

dennisbader commented 1 year ago

Hi @pshetty44, and thanks for writing. Gridsearch actually only supports single time series see also the docs here. I agree that it might be a bit intransparent. We could raise an error in this case. I'll ad it to our backlog.

Gridsearch is only providing very basic hyper-parameter search. For anything sophisticated we recommend relying on other libraries such as Optuna or RayTune. We also provide a user guide for this here

pshetty44 commented 1 year ago

Thanks for the update @dennisbader. Adding an appropriate error message would be helpful. You can close this issue.

Eduardo-Vilas-Boas commented 3 months ago

Hello, is this issue still being pursued? I would like to contribute.