unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.91k stars 858 forks source link

Extend `TimeSeries.resample()` to be more like pandas `resample()` #699

Open vfilter opened 2 years ago

vfilter commented 2 years ago

Describe the bug Hi, I'm trying to downsample a high-frequency Darts time series object to daily/monthly/yearly mean/sum/min/max etc. But the behavior is different than Pandas, so my only current option is to downsample before converting to a time series object.

To Reproduce

import pandas as pd
import numpy as np
from darts import TimeSeries
from darts.datasets import MonthlyMilkDataset, MonthlyMilkIncompleteDataset

series = MonthlyMilkDataset().load()

series.resample('Y').sum()

Results in

component Pounds per cow 10094.0 dtype: float64

Expected behavior Expected a time series with the sum of each year, rather than the sum of all years. That is the default behaviour of the pandas resample method and I'm having a hard time understanding why this wouldn't work in Darts.

System (please complete the following information):

dennisbader commented 2 years ago

Hi @vfilter and thanks for writing. Our resample() method works different to pandas resample. From our docs:

Resample creates a reindexed time series with a given frequency.
The method is used to fill holes in reindexed TimeSeries, by default 'pad'.

It is true that it would be nice to extend our method to be more like pandas` resample. I will add this to our backlog.

You can achieve the desired results with:

TimeSeries.from_dataframe(series.pd_dataframe().resample('Y').sum())
dennisbader commented 2 years ago

Pandas returns a *IndexResampler. We could either extend resample() or implement a new method. We could add support for IndexResampler to our stats methods like sum() which would convert it back into a TimeSeries object.

ahgraber commented 1 year ago

Following this issue with interest! Aggregate resample is critical to many of our workflows. Currently, we're using this function to retain timeseries metadata attributes, but it feels a bit hacky

def ts_resample(ts, freq="Y", agg="sum"):
    tmp = TimeSeries.from_dataframe(ts.pd_dataframe().resample(freq).agg(agg))
    tmp = tmp.with_static_covariates(ts.static_covariates)
    tmp = tmp.with_hierarchy(ts.hierarchy)
    return tmp