unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.11k stars 884 forks source link

Exclusion Zones in Time Series Dataset for Selective Training #2129

Closed patrickfleith closed 3 months ago

patrickfleith commented 11 months ago

Is your feature request related to a current problem? Please describe. As a data scientist working in the space industry we face time series where where some segments of the time series are "missing" sometimes several month is a row (out of a 10 years dataset), or simply there are different phase in the mission with anomalous period (there was a problem) and we would like forecasting model not to learning from those abnormal segments.

Describe proposed solution I will soon try to implement a variation of the TrainDataSet which filters anomalous period so it is usable from the fit_from_dataset() routine. I would share my code in this issue if it works well.

Describe potential alternatives Possible alternative could be to label that describe "normal" and "anomalous" portion? so that darts automatically split it into two series? I also have to investigate how this could work.

Additional context By the way our time series are so long that we would also benefit from a stride or uniform sampling in the as already mentionned in #1348.

Unless somebody wants to look into this or provide feedback on a better way to do it than the one proposed, please feel free to assign it to me. I'll keep you posted.

dennisbader commented 11 months ago

Hi @patrickfleith. You could also convert your single TimeSeries with the anomalous/missing time frames into multiple series each containing a non-anomalous/contiguous time window.

Since the TorchForecastingModels (TFMs) are global. they train and predict on multiple time series. Simply pass the sequence (e.g. list) of series to fit() and predict().

Regarding sampling: one way to limit the number samples is to limit the number of batches for training and validation through PyTorch Lightning (PL). The PL Trainer flags of interest are "limit_train_batches" and "limit_val_batches". You can pass them in the pl_trainer_kwargs dict at model creation.

patrickfleith commented 10 months ago

Thanks, I think that's a great approach. I'll explore it and publish a link to a notebook for demonstration once I implement it.

Issue can be closed, or you can wait until I publish that notebook