RNN Univariate accuracy issue

chelvanai commented 2 years ago

I got only some times better accuracy, but mostly the final output graph is not good. Here I have attached my code, can you tell me the issues.

import numpy as np from darts.models.forecasting.forecasting_model import GlobalForecastingModel

from darts.utils.data import InferenceDataset, DualCovariatesInferenceDataset, sequential_dataset from darts.utils.data.shifted_dataset import GenericShiftedDataset, DualCovariatesShiftedDataset from darts.utils.timeseries_generation import datetime_attribute_timeseries

from darts.dataprocessing.transformers import Scaler from darts.metrics import mape from darts.timeseries import TimeSeries from darts.models import RNNModel import matplotlib.pyplot as plt import pandas as pd

series = np.array( [146376, 147079, 159336, 163669, 170068, 168663, 169890, 170364, 164617, 173655, 171547, 208838, 153221, 150087, 170439, 176456, 182231, 181535, 183682, 183318, 177406, 182737, 187443, 224540, 161349, 162841, 192319, 189569, 194927, 197946, 193355, 202388, 193954, 197956, 202520, 241111, 175344, 172138, 201279, 196039, 210478, 211844, 203411, 214248, 202122, 204044, 212190, 247491, 185019, 192380, 212110, 211718, 226936, 217511, 218111, 226062, 209250, 222663, 223953, 258081, 200389, 197556, 225133, 220329, 234190, 227365, 231521, 235252, 222807, 232251, 228284, 271054, 207853, 203863, 230313, 234503, 245027, 244067, 241431, 240462, 231243, 244234, 240991, 288969, 218126, 220650, 253550, 250783, 262113, 260918, 262051, 265089, 253905, 258040, 264106, 317659, 236422, 250580, 279515, 264417, 283706, 281288, 271146, 283944, 269155, 270899, 276507, 319958, 250746, 247772, 280449, 274925, 296013, 287881, 279098, 294763, 261924, 291596, 287537, 326202, 255598, 253086, 285261, 284747, 300402, 288854, 295433, 307256, 273189, 287540, 290705, 337006, 268328, 259051, 293693, 294251, 312389, 300998, 309923, 317056, 293890, 304036, 301265, 357577, 281460, 282444, 319077, 315191, 328408, 321044, 328000, 326317, 313524, 319726, 324259, 387155, 293261, 295062, 339141, 335632, 345348, 350945, 351827, 355701, 333289, 336134, 343798, 405608, 318546, 314051, 361993, 351667, 373560, 366615, 362203, 375795, 346214, 348796, 356928, 417991, 328877, 323162, 374142, 358535, 391512, 376639, 372354, 388016, 353936, 368681, 377802, 426077, 342697, 343937, 372923, 368923, 397969, 378490, 383686, 382852, 350560, 349884, 335571, 384286, 310269, 299488, 328568, 329866, 347768, 344439, 348106, 353473, 324708, 338630, 339386, 400264, 314640, 311022, 360819, 356460, 365713, 358675, 362027, 362682, 346069, 355212, 365809, 426654, 335608, 337352, 387092, 380754, 391970, 388636, 384600, 394548, 374895, 379364, 391081, 451669, 355058, 372523, 414275, 393035, 418648, 400996, 396020, 417911, 385597, 399341, 410992, 461994, 375537, 373938, 421638, 408381, 436985, 414701, 422357, 434950, 396199, 415740, 423611, 477205, 383399, 380315, 432806, 431415, 458822, 433152, 443005, 450913, 420871, 437702, 437910, 501232, 397252, 386935, 444110, 438217, 462615, 448229, 457710, 456340, 430917, 444959, 444507, 518253, 400928, 413554, 460093, 450935, 471421])

series = TimeSeries.from_values(series)

transformer = Scaler() series_transformed = transformer.fit_transform(series)

my_model = RNNModel( model="LSTM", hidden_dim=20, dropout=0, batch_size=16, n_epochs=300, optimizer_kwargs={"lr": 1e-3}, model_name="Air_RNN", log_tensorboard=True, training_length=25, input_chunk_length=15, force_reset=True, save_checkpoints=True, )

my_model.fit_from_dataset( DualCovariatesShiftedDataset(series_transformed), verbose=True, )

pred = my_model.predict_from_dataset(input_series_dataset=DualCovariatesInferenceDataset(series_transformed), n=10)

print(transformer.inverse_transform(pred))

series.plot() transformer.inverse_transform(pred)[0].plot() plt.show()

This result most times I got, I think the training data shift is not in order, I mean it takes values from here and there, so it effects the accuracy! I don't know correctly. can you tell me the issue.

result

hrzn commented 2 years ago

Hi, there can be a thousand reasons why a model does not perform well. First: if you want reproducible results from one run to the next, you need to set your random seeds (or set the random_state parameter of the model to some value). Second: have you tried simpler models first (such as naive models, or exponential smoothing)? Third: have you tried to search for good hyper-parameters?

chelvanai commented 2 years ago

Hi @hrzn the data loader has shuffled, it is right ?

https://github.com/unit8co/darts/blob/eda8f94329865df8de692642fdabae97b8e121db/darts/models/forecasting/torch_forecasting_model.py#L880

hrzn commented 2 years ago

Yes that's correct. The initialization of the network weights is also random.

unit8co / darts

RNN Univariate accuracy issue #857