unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.91k stars 858 forks source link

Can Darts work with inconsistent time frequency? #1838

Open nogamosc opened 1 year ago

nogamosc commented 1 year ago

I'm trying to use Darts in order to solve classification problem. I have dataset with many patients (thousands) and for each patient I have many sequential tests (each test contains the same (and many) values such as blood pressure etc.). Each test was performed in different time (can be any time). All patients have label as Sick or Not Sick.

I'm working for a long time now trying to understand how to convert my data to TimeSeries object correctly and use Darts to try to solve it.

According to this answer here Darts doesn't support Time Series data with inconsistent frequencies, is this still the case? Because In my data there is not fix time for the tests of the patients. I couldn't figure this out from the doc.

Can anyone help?

madtoinou commented 1 year ago

Hi @nogamosc,

If the frequency is irregular within one TimeSeries, then darts does not support it. I think that this is what you're describing: a multivariate ts with irregular spacing between the timestamps. #1571 tracks this feature request.

If the frequency is different across recordings , you can use meta-learning (multi univariate series training).

A possible work-around would be to convert the time indexes to RangeIndex instead of DatetimeIndex but the ts would become regularly spaced, loosing a critical piece of information.

As a side note, darts does not contain a ts classification module at the moment so you would have to implement your own model.

nogamosc commented 1 year ago

Thanks for the answer, I'll follow the track you mentioned for future works :) Also, I think it may be nice to add this constraint to the docs. It may seem obvious but nowhere in the documents is it written that the ts must be with regular frequency.

madtoinou commented 1 year ago

It's written in documentation, in the comment at the top of TimeSeries.py and the associated documentation page:

"_Have a well defined frequency (date offset aliases <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases> for DateTimeIndex, and step size for RangeIndex)_".

But I would agree that "well defined" does not directly mean "constant/consistent" frequency for everyone. Any suggesting of better wording?