unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.13k stars 886 forks source link

Union function to find the intersection of time series #2042

Open Tristanjmeyers opened 1 year ago

Tristanjmeyers commented 1 year ago

Hello! I think it would be incredibly valuable to have a function that finds the intersection of a list of time series. This would be something akin to xarray's "align" function, which finds the intersection across a list of variables.

I commonly find I have to manually convert my TimeSerries back to an xarray data array, run an align, then convert back to a TimeSeries. For instance:

import xarray as xr
from darts.timeseries import TimeSeries

ts1 = TimeSeries.from_csv('file1.csv', time_col='time')
ts2 = TimeSeries.from_csv('file2.csv', time_col='time')
ts3 = TimeSeries.from_csv('file3.csv', time_col='time')

ts1_, ts2_, ts3_ = xr.align( ts1.data_array(), ts2.data_array(), ts3.data_array(), exclude = 'component')

ts1 =  TimeSeries(ts1_)
ts2 =  TimeSeries(ts2_)
ts3 =  TimeSeries(ts3_)

So something like:

ts1 = TimeSeries.from_csv('file1.csv', time_col='time')
ts2 = TimeSeries.from_csv('file2.csv', time_col='time')
ts3 = TimeSeries.from_csv('file3.csv', time_col='time')

ts1, ts2, ts3 = Timeseries.intersection(ts1, ts2, ts3)

if the intersection doesn't exist, it could raise a warning like: Warning: there is no overlapping times between the time series.

Another option would also be an option for "stack" to take an argument to do this intersection:

full_ts = ts1.stack(ts2, intersection = True).stack(ts3, intersection = True)

And finally, if this could be added in the pipeline API, that would be excellent!

Apologies if these features already exists! I am new to the package, but I couldn't find an example or something like this in the API. I am loving it so far though!

madtoinou commented 1 year ago

Hi @Tristanjmeyers,

This features is already implemented in TimeSeries.slice_intersect() (doc), you'll have to call it several times if you have more than two series.

I like the idea of offering this feature as a data-transformer so that it can be included in a Pipeline, WDYT @dennisbader ?