unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.12k stars 886 forks source link

Allow for irregularly sampled data #1571

Open bigskapinsky opened 1 year ago

bigskapinsky commented 1 year ago

Is your feature request related to a current problem? Please describe. Not all timeseries are regularly sampled inputs. For example, looking at event-based time series, each sample can occur at a random point in time. Modeling these time series is possible with different approaches, and I think that this (already great) library could better encompass any and all time series problem we throw at it if we allow these types of time series to exist.

Describe proposed solution Create a new Data Type (like IrregularTimeSeries) which would have the some of the same functionalities of the TimeSeries object but without a "frequency" component. Of course, many prediction algorithms can't work with this format, however Anomaly Detection, or Sequence Classification can still very much work. A method to resample this to get a normal TimeSeries Data Type could also be nice:

df = pd.DataFrame.read_csv("machine_data.csv")
""" 
This would look something like this:
'2023-01-01 10:14:23', 0.25, 12.3, 0.5
'2023-01-01 10:15:52', 0.26, 11.2, 0.3
'2023-01-01 10:21:01', 0.75, 19.5, 1.2
'2023-01-01 10:32:29', 0.12, 12.5, 0.2
'2023-01-01 10:40:39', 0.58, 15.9, 0.8
.
.
.
"""
i_ts = IrregularTimeSeries.from_dataframe(df, "timestamp", "pressure", "temperature", "current")
ts = i_ts.to_regular_timeseries(frequency="15m", method="avg", fillna=0) # or something like this...

Describe potential alternatives The obvious choice is to resample or group by time interval to get a regular time series, however this doesn't always keep all the data's subtleties for ML tasks. Sometimes data comes in bursts, or the measurements have no activity at night, on weekends, holidays, etc.. crushing these

Additional context To give you an example, this is the kind of data I'm working with, and trying to classify anomalous activity in a machine. Each data point is a measurement of the machine when it creates a product. The red area is a labeled time where the machine is misbehaving. image I would love to try to use Darts for this, but resampling the data to even samples removes too much information, or introduces too many zeros for my model to accurately detect much...

I might try coding this feature myself... we'll see what I can get up to. 😄

madtoinou commented 1 year ago

Hi @bigskapinsky!

Thank you for this detailed feature request. I think that such a feature would be a great addition to Darts, especially since the Anomaly Detection module was released. WDYT @dennisbader ?

I add this to our backlog and curious to see what you will come up with! Don't hesitate to open a Draft PR if you would like feedback or to discuss with other contributors.

PS: There might be overlap with #1346, which would then have to cover the RangeIndex, DateTimeIndex and EventBasedIndex?

hrzn commented 1 year ago

+1 with this suggestion. We had at the back of our mind the possibility to support such non-regular series at some point. Personally I think it'd be a great addition, even if it's not high prio right now. Operating on these series themselves would call for a different kind of models (e.g. coming from the field of point processes modelling and other such approaches for event-based data). For instance all the current forecasting models in Darts (and most things in the "traditional time series" literature, or ML-on-time-series literature) has as an underlying assumption that the series are regularly-spaced. There could be some immediate value in starting with an IrregularTimeSeries type first and allowing easy conversion towards TimeSeries; we would of course be happy to receive contributions in this direction @bigskapinsky :)

madtoinou commented 1 year ago

(Just realized that this is probably a duplicate of #1491, linking it so that we can easily keep track of both conversations)

bigskapinsky commented 1 year ago

Alright! I'm glad my suggestion caught your attention. Indeed @madtoinou, the issues you linked to are indeed in the same vein, I see that I'm not alone in this predicament (even though as @hrzn mentioned, most time-series-based literature assumes evenly-spaced time steps)

I'll see what I can come up with in the coming weeks!

tuomijal commented 1 year ago

Hi @bigskapinsky and others,

I am also interested in this feature. Have you been able to work on this @bigskapinsky? Should there be a new parent class for both TimeSeries and IrregularTimeSeries objects to inherit from?

It seems there has to be some modifications to create_lagged_data function as well to work with irregular data?