Wonder if there is any way to make a model predict more spread 'dynamic'? explain more in body

Allena101 commented 6 months ago

Hello! I have been trying alot to get a model to predict well on some electricity data. I have tried several models (but not all yet), and TCN is one of the ones that performs the best.

The model achieves a pretty good rmse and it outperforms the naive models substantially.

If i train and predcit on accumulated kwh usage the results looks better (but with worse rmse). If i train on interval sum (which is stationary) then it gets a better rmse but the results looks kind of off.

¤#¤nov_iSum4

¤#¤nov_KWh4

My judgement is that the iSum is a better forecast but that it looks worse. Which brings me to my question of whether you can adjust some parameter so that the models predicts more similarly to the data it trains on. Or perhaps you could just transform the iSum prediction in some way so that it looks more like the typical iSum data.

Allena101 commented 6 months ago

My feeling is that the model is either underfitted or badly suited model (or badly chosen model parameters)

dennisbader commented 6 months ago

Hi @Allena101, about the iSum part:

if the variance of iSum cannot be explained by the most recent past of iSum, or any other covariates that you might use, then to the model it looks like noise around iSum4_prediction.
The model optimizes on achieving the lowest RMSELoss. So it seems that it achieves lowest errors when not trying to "randomly" predict the variance, but rather make a conservative prediction.

To increase the spread:

add covariates which explain the variance better
use a probabilistic version of the model, for example with likelihood=QuantileRegression(), to make quantile predictions.

Allena101 commented 6 months ago

Hi @Allena101, about the iSum part:

if the variance of iSum cannot be explained by the most recent past of iSum, or any other covariates that you might use, then to the model it looks like noise around iSum4_prediction.

The model optimizes on achieving the lowest RMSELoss. So it seems that it achieves lowest errors when not trying to "randomly" predict the variance, but rather make a conservative prediction.

To increase the spread:

add covariates which explain the variance better

use a probabilistic version of the model, for example with likelihood=QuantileRegression(), to make quantile predictions.

thanks for reading my issue and replying, Dennis!

I did manage to QuantileRegression likelihood work and it seems to have the desired effect!

When I am using the model to predict , the num_samples argument gives many predictions for each time_step (which is understandable). I am a bit confused to have i am supposed to utilize it though. In your guys TFT example you drew many samples and seemingly plotted the minimum and maximum. Is there any meaningful difference from drawing many samples and taken the average from those samples compared to not using likelihood at all?

Also, in the TFT guide, is showed how to train TFT without using likelyhood. This makes me wonder if I can predict 'normally' using a model that has been trained with a likelihood method? Or would I have to take many samples and average them in that case?

Does any of your guides show how to use any other likelyhood method besides QuantileRegression? I tried to get Poisson working but did not.

Unrelated to this. Which model has generally provided the best score? I know that different models will work better on different kinds of data, but if we are thinking about the typical air passenger and electricity practice datasets. is it nbeats?

dennisbader commented 6 months ago

With models that use a likelihood (Darts' regression and torch (neural network) models, you can use predict_likelihood_parameters=True when calling model.predict().

For QuantileRegression it will give you all predicted quantiles including the median q0.50 prediction.

Predicting with num_samples=1 with a model that uses a likelihood, it will give you one point sampled from the predicted distribution, which is not representative.

Here is another example using the GaussianLikelihood.

Some good model candidates are:

NBEATS / NHiTS (especially if you don't have future or static covariates)
N/DLinear (can use all types of covariates)
TiDEModel (can use all types of covariates)
TCNModel (can work very well depending on the seasonal patterns of the target series)

Also try once with use_reversible_instance_norm=True at model creation, this can in many cases improve model performance.

Allena101 commented 5 months ago

For QuantileRegression it will give you all predicted quantiles including the median q0.50 prediction.

"For QuantileRegression it will give you all predicted quantiles including the median q0.50 prediction." I dont understand what this mean. What does making an e.g. 0.95 percentile prediction even mean?

Allena101 commented 5 months ago

Also try once with use_reversible_instance_norm=True at model creation, this can in many cases improve model performance.

I tried is use_reversible_instance_norm layer normalization?

use_reversible_instance_norm=True

thanks for the model sugestions! I tried most of them already so its good to know i am on the right track.

madtoinou commented 1 week ago

Hi @Allena101,

When the model is probabilistic, it tries to learn the parameters of the target distribution (or the quantiles between which the forecasts are likely/expected to be).

Using a likelihood is very important, especially when you need to take a decision based on a forecasts because you get an idea of how confident the model is (even if the accuracy is not necessarily better). If the spread of the forecasts is considerable, you might not want to rely on them whereas having all the forecasts in a very thin interval indicates that you can pretty much consider them as a point-forecast.

When you generate probabilistic forecast, it's pretty much always better to use num_samples>>1 and use the median as point-forecast to avoid outliers.

Closing this for now, feel free to reopen if something is still unclear.

unit8co / darts

Wonder if there is any way to make a model predict more spread 'dynamic'? explain more in body #2256