unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.11k stars 884 forks source link

[QUESTION] NaN handeling in model.fit() #2543

Closed SafetyMary closed 2 months ago

SafetyMary commented 2 months ago

I am facing this error when using XGBoost model since i have NaN values in my target TimeSeries object.

Check failed: valid: Label contains NaN, infinity or a value too large

I have seen multiple solutions suggested here

However, i would much rather Darts model fit ignore any data slices containing NaN during training.

E.g. for a time series [1, 2, 3, NaN, 5, 6, 7] fitting into a model with lag=2 I would like the following behaviour

data slice 1: [1, 2] >>> 3 data slice 2: [2, 3] >>> NaN (ignore this during .fit()) data slice 3: [3, NaN] >>> 5 (ignore this during .fit()) data slice 4: [NaN, 5] >>> 6 (ignore this during .fit()) data slice 5: [5, 6] >>> 7

May I know if Darts current support the above?

madtoinou commented 2 months ago

Hi @SafetyMary,

This is possible thanks to the sample_weights parameter, where the steps corresponding to NaN are assigned a weight of 0. You can find an example in the quickstart.

In your example, the sample weights array would be [1, 0, 0, 0, 1] (it will automatically be normalized). You will have to generate this manually but since you seem to use output_chunk_length=1, it should be straightforward.

SafetyMary commented 2 months ago

I have applied your solution and it works fine.

Would it be possible to have Darts handle this autometically via arguments rather than having users to generate sample weights manaully? E.g. ignore data slice if target or feauture slice contains null. Generating sample weights is certainly a non-trivial process for cases with nulls in multiple past/future/static covariates.

Will close the issue for now.