salesforce / Merlion

Merlion: A Machine Learning Framework for Time Series Intelligence
BSD 3-Clause "New" or "Revised" License
3.36k stars 295 forks source link

More robust detection of time series granularity. #135

Closed aadyotb closed 1 year ago

aadyotb commented 1 year ago

Previously, we would detect the granularity of a time series as the GCD of all timedeltas found in the time series (assuming pandas couldn't infer the granularity on its own). However, this behavior fails for time series with missing data that are sampled at granularities that aren't an exact number of seconds, e.g. monthly time series would be resampled to a daily granularity because months are of inconsistent length.

This PR uses the most commonly observed timedelta (instead of the GCD), and it also checks whether a k-month granularity is a better fit for the time series than a n-day granularity. To handle non-fixed granularities (e.g. months or years), we also start maintaining memory of an offset. The general approach is then to resample the time series using the detected granularity, and then add any offset necessary to recover the original timestamps. This is necessary because resampling using the monthly granularities in pandas only return time series sampled on specific days of the month (typically first/last). Maintaining the offset allows us to support more general time series.