online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.07k stars 548 forks source link

Timeseries forecast evaluation #942

Closed dberardo-com closed 2 years ago

dberardo-com commented 2 years ago

2 matters here:

  1. after introducing the ThersholdFilter in the time_series pipeline, it is now hard to evaluate models with vectors that have anomalies. Is it possible to "pass-through" the anomaly score 1/0 to the output of the function model.forecast() ? this way one could skip metrics.update for those anomalies when looping around the dataset.

  2. for a-posteriori evaluation of a time_series forecaster, i am try to replay the dataset with a pre-trained model and see how that model would have performed for "reconstructing" the whole dataset. to make this i would simply need to call model.forecast() using values in the past. but is this possible? how should i call the forecast method? And more in general, what is the difference between horizon, and xs in the forecast signature?

dberardo-com commented 2 years ago

so this is what i am trying to achieve here:

[model.forecast(horizon=1,xs=[x])[0] for x,y in gen_dataset()]

gen_dataset() returns the Waterflow dataset in this case.

but here i get this strange vector:

[104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, 104.5113750076123, .... ... ]

with all same values

MaxHalford commented 2 years ago

after introducing the ThersholdFilter in the time_series pipeline, it is now hard to evaluate models with vectors that have anomalies. Is it possible to "pass-through" the anomaly score 1/0 to the output of the function model.forecast() ? this way one could skip metrics.update for those anomalies when looping around the dataset.

You should be able to do this by splitting your pipeline into an anomaly detector and a model. Loop through the dataset, manually, check if a label is anomalous, if so discard it, if not learn and update the metric. I can provide an example if you struggle, but I don't have a lot of time right now.

for a-posteriori evaluation of a time_series forecaster, i am try to replay the dataset with a pre-trained model and see how that model would have performed for "reconstructing" the whole dataset. to make this i would simply need to call model.forecast() using values in the past. but is this possible? how should i call the forecast method?

This doesn't make sense in online learning, and especially for time series models. You're supposed to interleave learning and inference. You could potentially use a pre-trained regression model, but not a time series model.

And more in general, what is the difference between horizon, and xs in the forecast signature?

dberardo-com commented 2 years ago

You should be able to do this by splitting your pipeline into an anomaly detector and a model. Loop through the dataset, manually, check if a label is anomalous, if so discard it, if not learn and update the metric. I can provide an example if you struggle, but I don't have a lot of time right now.

no need of an example. i thought of doing it this way, but just wanted to be sure that there was no simpler way to achieve this. i know now

This doesn't make sense in online learning, and especially for time series models. You're supposed to interleave learning and inference. You could potentially use a pre-trained regression model, but not a time series model.

i see you point, so i assume that is not possible with time_series

horizon is the number of time steps ahead.

and what is time step? assuming an hour? or is this a param to initialize the time_series model with?

MaxHalford commented 2 years ago

no need of an example. i thought of doing it this way, but just wanted to be sure that there was no simpler way to achieve this. i know now

As of now, I would say we need to add a custom function in the evaluate module. It would have the same parameters as progressive_val_score, but would also allow passing an anomaly detector that would ignore anomalies. Actually we could just add a parameter to progressive_val_score. To be determined.

and what is time step? assuming an hour? or is this a param to initialize the time_series model with?

Good question! It's implicit to your problem. Indeed we assume a uniform step between samples. So it could be a day, an hour, or even a second. It depends on your problem.

dberardo-com commented 2 years ago

Good question! It's implicit to your problem. Indeed we assume a uniform step between samples. So it could be a day, an hour, or even a second. It depends on your problem.

i see ... but if i add an explicit set of features as the xs argument, that would be taken instead of horizon right?

so for example if i want to predict the next 1.5, 6 and 13 hours i would do:

model.predict(horizon=3, xs=[{ Time: now + datetime.delta( 90 minutes), ... other features   },{ Time: now + datetime.delta( 6 hours), ... other features   }, { Time: now + datetime.delta( 13 hours), ... other features   }])

right?

dberardo-com commented 2 years ago

the reason why i am asking is because i am trying to compare the predictions made from the same model 6hours, 24hours, and 30 days ahead, with one another, and i get strange results. Here the plot of 24 hours ahead (looking ok). and then the one 6 hours ahead ... which is supposed to look better, but it is not.

Notice that in my "experiment" i shift the rows in the DB so that the forecast made now is compared to the actual realization of the stochastic variable. This happens at the end of this code:

import pandas as pd
import moment
import datetime

scores=[]
counter = 0
dataset = gen_dataset()
total = len(list(dataset))
model=gen_model()
metric = metrics.SMAPE()
#metric = metrics.Rolling(metrics.MAE(), 12)
for x, y in gen_dataset():  
    counter = counter+1
    x = x['Time']
    x_orig = x
    y_orig = y
    if y == None: continue

    if counter > 2:
        horizon=period
        future = [{'Time': x + datetime.timedelta(hours=m)} for m in range(1, horizon + 1)]
        pred=model.forecast(horizon=horizon,xs=future)
        s1=pred[-1]
        horizon=int(period/4)
        future = [{'Time': x + datetime.timedelta(hours=m)} for m in range(1, horizon + 1)]
        pred=model.forecast(horizon=horizon,xs=future)
        s2=pred[-1]
        horizon=period*30
        future = [{'Time': x + datetime.timedelta(hours=m)} for m in range(1, horizon + 1)]
        pred=model.forecast(horizon=horizon,xs=future)
        s3=pred[-1]
    else:
        s1=0
        s2=0
        s3=0

    metric = metric.update(y, s1)
    model.learn_one({'Time': x},y)
    scores.append([moment.date(x_orig).epoch(),y_orig,metric.get(),s1,s2,s3])

df=pd.DataFrame(scores,columns=["t","feature","score","prediction_1d","prediction_6h","prediction_1m"])
df.t = pd.to_datetime(df["t"], unit='s')
df=df.set_index("t",drop=True)
df["prediction_1d"] = df["prediction_1d"].shift(period, freq="h")
df["prediction_6h"] = df["prediction_6h"].shift(int(period/4), freq="h")
df["prediction_1m"] = df["prediction_1m"].shift(period*30, freq="h")
df["reconstruction"] = df.apply(lambda x: model.forecast(horizon=period, xs=[{'Time': x.name + datetime.timedelta(hours=m)} for m in range(1, period + 1)])[-1], axis = 1)
print(metric)
display(df.head(25))
df.plot()
MaxHalford commented 2 years ago

i see ... but if i add an explicit set of features as the xs argument, that would be taken instead of horizon right?

That's model dependent. In the case of SNARIMAX, an error is raised if the length of xs and the value horizon aren't equal. For HoltWinters, xs isn't used. I don't know what model you're using.

I don't have to dive into your code right now, too much stuff going on, sorry 😬

dberardo-com commented 2 years ago

That's model dependent. In the case of SNARIMAX, an error is raised if the length of xs and the value horizon aren't equal. For HoltWinters, xs isn't used. I don't know what model you're using.

i see ... i was using HoltWinters from the example you provided with the WaterFlow DS. In this case i will redo the experiment above with the SNARIMAX model and see if i get the same strange behavior.

I don't have to dive into your code right now, too much stuff going on, sorry 😬

sure, i will post an update in the next days, after having changed the model to snarimax

if stuff works after that, i could make a documentation page with the experiment

dberardo-com commented 2 years ago

i was now setting up the SNARIMAX test and wanted to make a grid search on the hyperparams. to do this i use the model_selection package but i get into this validation issue, here i see that SMAPE metrics work against SNARIMAX:

image

but when calling is from a model evaluator i get an exception because of this function returning False:

image


complete trace:

image

MaxHalford commented 2 years ago

Indeed the model_selection module doesn't work well with the time_series module. Indeed, it's not obvious to compare two time series models that make forecasts over a horizon. How would you select the best one? With an average metric over the horizon? Food for thought.

dberardo-com commented 2 years ago

right, so this is what i am doing now, a dirty grid search:

image

and take the best row in the data frame on average:

import itertools
dataset = gen_dataset()
metric = metrics.SMAPE()

period = 24  # 24 samples per day

def gen_model(**kwargs):
    return get_transform | gen_thresholder() | gen_forecaster(** dict(m=12,sp=3,sq=6) | kwargs)
    #return get_transform | gen_forecaster(** dict(m=12,sp=3,sq=6) | kwargs)

model = gen_model()

keys, values = (
    ["m","p","sp"],
    [[12,24],[0,2],[0,1,2]]
)

combinations = list(map(lambda x : dict(zip(keys,x)),itertools.product(*values)))
print(list(combinations))

results = [time_series.evaluate(
    dataset,
    gen_model(**i),
    metric,
    horizon=12
).get() for i in combinations]
df=pd.DataFrame(results, index=list(map(str,combinations)))
display(df.T)
best_combination = eval(df.mean(axis=1).idxmin())
print("Best combination",best_combination)
df.T.plot()
emreguenduez commented 2 years ago

Hey @dberardo-com, would you mind sharing the code and the pipeline of your model? I am trying to predict an hourly time-series using SMARIMAX in river as well and the results I am getting are not quite satisfactory compared to yours. It would be great if I could take a look at yours. Thanks!

dberardo-com commented 2 years ago

hi @emreguenduez , i have actually managed to get even better results. I would like to make a blog post on how to achieve that, especially because i would like to start a discussion with practitioners on how to improve the model and which could be drawbacks.

i would like to first get this PR done: https://github.com/online-ml/river/pull/949 and then move to the blog post (next weekend might be a possibility).

@MaxHalford the PR above is still valid right? it didnt become obsolete due to the recent PR on generic rolling etc.?

emreguenduez commented 2 years ago

@dberardo-com that's great! could you send me your blog so I can check it there when it's published, or mention me when you write/publish the blog post? looking forward to it!

MaxHalford commented 2 years ago

@MaxHalford the PR above is still valid right? it didnt become obsolete due to the recent PR on generic rolling etc.?

I'll take a look soon, but I think you're still good :)

dberardo-com commented 2 years ago

will try to make the needed doc adjustments ASAP

MaxHalford commented 2 years ago

I've written down some thoughts here.

MaxHalford commented 2 years ago

Closing this for housecleaning purposes. Feel free to reopen (another issue) if you feel it's necessary.