mrtn37 commented 1 year ago

The N_Beats model in the probabilistic mode requires the usage of a Scaler beforehand. Otherwise the results are unplausible when comparing them to the deterministic version.

Since this is not explicitely mentioned in the docs, I believe this behavior to be unintended.

To Reproduce Use a simple N_Beats model without any Scaler beforehand. NBEATSModel( input_chunk_length=30, output_chunk_length=10, generic_architecture=False, num_blocks=3, num_layers=4, layer_widths=512, batch_size=32, model_name="nbeats_det",

likelihood=GaussianLikelihood(),

force_reset=True,
random_state=42

) Depending on wether the likelihood paramter is defined, the results from the deterministic and the probabilistic version (taking the mean oder the median) are numerically not of the same magnitude. They seem to be comparable, as soon as a scaler is used beforehand.

Expected behavior I expect the results of the deterministic version to be somewhat similar to those of the probabilistic version.

System (please complete the following information):

Python version: 3.8.16
darts version 0.23.1

alexcolpitts96 commented 1 year ago

Scaling is a common preprocessing requirement for time series forecasting. There are some models that do not need processing; however, most deep learning approaches do require it.

I am aware that NBEATS is a model that can achieve good results without preprocessing, but this still requires careful hyperparameter tuning. With a little bit of tuning (and gradient clipping), I was able to get likelihood forecasts working without scaling.

I will highly recommend using scaling (either local or global) since it will make training and model lifecycle management a lot easier.

import pandas as pd
import matplotlib.pyplot as plt

from darts.dataprocessing.transformers import Scaler
from darts.datasets import AirPassengersDataset
from darts.models import NBEATSModel
from darts.metrics import rmse, mae
from darts.utils.likelihood_models import GaussianLikelihood

# set torch precision to medium
import torch
torch.set_float32_matmul_precision("medium")

# Read data:
series = AirPassengersDataset().load()
#series = Scaler().fit_transform(series)

# Create training and validation sets:
train, val = series.split_after(pd.Timestamp(year=1957, month=12, day=1))

model_args = {
    "input_chunk_length": 24,
    "output_chunk_length": 12,
    "n_epochs": 100,
    "pl_trainer_kwargs": {
        "accelerator": "auto",
        "gradient_clip_val": 10.0,
    },
    "optimizer_kwargs": {
        'lr': 1E-4, 
    },
}

prob_model = NBEATSModel(
    **model_args,
    likelihood=GaussianLikelihood(),
)

prob_model.fit(
    series=train,
    val_series=val,
)

model = NBEATSModel(
    **model_args,
    likelihood=None,
)

model.fit(
    series=train,
    val_series=val,
)

plt.figure()
series.plot(label='actual')
model.predict(n=24).plot(label='prediction')
prob_model.predict(n=24).plot(label='probabilistic prediction')
plt.legend()
plt.savefig('prediction.png')
plt.close()

prediction

mrtn37 commented 1 year ago

Thank you for your explanation and for the great example. I was not aware that you could tune the model that way and make the gaussian forecast work that way. I will have to read up, what it exactly does :)

I agree on the logical matter, that scaling is often sensible and should be done in most cases.

My point if a more formal or technical one. If the need of a scaler is not a hard requirement, the code should run without any errors if no scaling was done. Hence I regard this behavior as bug. Judging by my 'naive' approach,as a user of this library, I did not expect that behavior. Maybe the model could test whether the data is scaled and, if not, throw a warning?

alexcolpitts96 commented 1 year ago

I think that the idea of darts having models check for data quality goes against the role that darts aims to fit as a tool. This is more of a design philosophy problem. I am not a core developer for darts, but have been using it for academic and professional purposes.

Darts allows for a lot of the tedious boilerplate code associated with timeseries applications to be abstracted. Writing your own scalers, dataloaders, model saving and loading, etc can be pretty tedious. Vanilla torch takes a decent amount of work to go from an idea to a deployable model. Pytorch Lightning helps remove training boilerplate; however, there is still a ton of tooling that darts adds around it. Not only does darts do that with torch-based models, but it also glues together a bunch of other popular timeseries libraries.

The core devs may think differently, but I am not sure a scale-check like you propose would be in-scope for darts. There are many ways that scaling (or lack of) can be compensated for, so a warning wouldn't be meaningful. That being said, adding quality checks and modeling your data before feeding it into darts (or other ML tools) will make your life a lot easier in the long-run.

mrtn37 commented 1 year ago

Okay, I am fine with that :)

Still, there is a technical feasible combination of parameter and model calls which leads to a 'bad' result. I am not arguing the case, that a more experienced user would avoid that based on his/her knowledge. However, from a coding perspective, such behavior should not be put to the user to resolve resp. to interpret without any further hints on what is going on. If the code accepts such parameter setting, the results should be within the user's expectation. As a naive example: I put in values in the range of ~100, I expect an output in that order and not in the order of ~0.1. As you put it: This is more of a design philosophy problem 😊

Thank you for your replies, I learned a lot 😊

unit8co / darts

[BUG] N_Beats in probabilistic mode requires a scaler #1788

likelihood=GaussianLikelihood(),