xKDR / TSFrames.jl

Timeseries in Julia
MIT License
89 stars 22 forks source link

Implement Method for Resampling #46

Open ValentinKaisermayer opened 1 year ago

ValentinKaisermayer commented 1 year ago

I think this is one of the most important methods for time series data. Being able to interpolate and aggregate.

I like the interface of Grafana, i.e. being able to specify not only the interpolation or aggregation method but both at the same time.

Useful if you have measurement data at e.g. about 5min intervals but with some holes in it and want to get a clean vector with an equidistant sample time of 15min. Where there is good data it has to be aggregated and where there are holes it has to be interpolated.

For interpolation, common methods would be

And for aggregation

chiraganand commented 1 year ago

For interpolation we are planning to integrate Impute.jl as it already works on a DataFrame and provides various interpolation methods as well as filtering of missing values.

chiraganand commented 1 year ago

Right now one can do: ts = ts.coredata |> Impute.locf() |> TS to carry forward last observations for missing data. Ideally, it would be good to have something like: impute(ts, :colname, LOCF).

ValentinKaisermayer commented 1 year ago

Nice package! However, that is not really the functionality I was referring to. I meant changing the time base of the data. Impute.jl seems to only calculate missing values.

A use-case would be measurement data, but not regularly sampled. For many methods, e.g. time series forecasting via AR, ARIMA, ..., the data needs to be sampled at regular intervals.

If you want I'll make a PR with such a method.

retime(ts, timestamps; upsample=:previous, downsample=:mean)

Notice that this - in most cases - will change the length of the object and hence can not be in-place.

chiraganand commented 1 year ago

So, the apply() method is built for doing frequency conversions as well. We earlier had a specific function to do frequency conversions similar to to.freq() from zoo/xts of the R world but then we realised we don't need it because the way Julia works so it was possible to do it with apply() itself. See: apply(ts, Dates.Minute(15), x -> mean(skipmissing(x)), last)

I do think it could be a good option to provide a frequency conversion method just for end-user convenience.

Though, apply() doesn't currently provide a way to upsample but downsampling works. For upsampling, do you think using functionality from Impute.jl and integrating it with apply() would solve your use case?

chiraganand commented 1 year ago

In fact, I think a good implementation of frequency conversion would be to have a function to compute endpoints, see: https://rdrr.io/cran/xts/man/endpoints.html. The function outputs a vector which can then be used in frequency conversion function as well as apply as you mention in #43.

ValentinKaisermayer commented 1 year ago

I would suggest two options:

As a general note:

chiraganand commented 1 year ago

A method for upsample() was pushed as part of #38. Currently, it only supports adding missing for in-between missing data. The code is here: src/upsample.jl.

femtotrader commented 1 year ago

You may be interested by this https://github.com/femtotrader/TimeSeriesResampler.jl

ValentinKaisermayer commented 1 year ago

You may be interested by this https://github.com/femtotrader/TimeSeriesResampler.jl

Seem to be unmainted as is the TimeSeries.jl package.

femtotrader commented 1 year ago

yep it's just to give an API idea

chiraganand commented 1 year ago

If you want I'll make a PR with such a method.

@ValentinKaisermayer I had missed this sentence earlier. Please do submit a PR if you can! :)

retime(ts, timestamps; upsample=:previous, downsample=:mean)
retime(ts, Dates.Minute(15),...)

I do like both these methods, though, I prefer only having the second one (Dates.Period as the second argument). Do we think users would want to supply their own timestamps (:Index) values to do the sampling? I would assume most users would just tell the period they are looking to resample the object to without caring how the package computes the timestamps.

chiraganand commented 1 year ago

Also, I would prefer the name resample() than retime() only because more people might end up googling for "how to resample timeseries in Julia".

ValentinKaisermayer commented 1 year ago

I would like to have both. So the user has full control over if he wants to have a regular or irregular TS back.

ParadaCarleton commented 1 year ago

I think you're looking for MessyTimeSeries.jl.

femtotrader commented 2 months ago

I wonder if offsets such as MonthEnd, YearEnd, BusinessMonthBegin... are implemented for resampling timeseries.

femtotrader commented 2 months ago

Resample example looks odd to me

resampling example should be done with 2 steps Close price -> weekly resample with OHLC (ie taking first max min last) -> Open High Low Close Volume -> weekly resample with sum as aggregate function -> Volume

chiraganand commented 2 months ago

I wonder if offsets such as MonthEnd, YearEnd, BusinessMonthBegin... are implemented for resampling timeseries.

You can do this by providing a function to endpoints():

julia> endpoints(ts, i -> lastdayofmonth.(i), 1)
chiraganand commented 2 months ago

Resample example looks odd to me

resampling example should be done with 2 steps Close price -> weekly resample with OHLC (ie taking first max min last) -> Open High Low Close Volume -> weekly resample with sum as aggregate function -> Volume

Where is this example you are referring to?

femtotrader commented 2 months ago

https://github.com/xKDR/TSFrames.jl?tab=readme-ov-file#frequency-conversion

chiraganand commented 2 months ago

Resample example looks odd to me

resampling example should be done with 2 steps Close price -> weekly resample with OHLC (ie taking first max min last) -> Open High Low Close Volume -> weekly resample with sum as aggregate function -> Volume

I think this is what xts to.period() also does. The OHLC parameter is set to TRUE by default. Yes, this is something missing from the to_period et al functions in TSFrames and should be incorporated. PRs are welcome. :)

chiraganand commented 2 months ago

Till then, apply() allows one to provide a function to aggregate values over a period.