predict-idlab / tsdownsample

High-performance time series downsampling algorithms for visualization
MIT License
153 stars 14 forks source link
aggregation downsampling fast lttb m4 minmax performance python simd time-series visualization

tsdownsample

PyPI Latest Release support-version Downloads CodeQL Testing Testing Discord

Extremely fast time series downsampling 📈 for visualization, written in Rust.

Features ✨

Install

pip install tsdownsample

Usage

from tsdownsample import MinMaxLTTBDownsampler
import numpy as np

# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))

# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)

# Select downsampled data
downsampled_y = y[s_ds]

# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)

# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]

Downsampling algorithms & API

Downsampling API 📑

Each downsampling algorithm is implemented as a class that implements a downsample method. The signature of the downsample method:

downsample([x], y, n_out, **kwargs) -> ndarray[uint64]

Arguments:

Returns: a ndarray[uint64] of indices that can be used to index the original data.

*When there are gaps in the time series, fewer than n_out indices may be returned.

Downsampling algorithms 📈

The following downsampling algorithms (classes) are implemented:

Downsampler Description **kwargs
MinMaxDownsampler selects the min and max value in each bin parallel
M4Downsampler selects the min, max, first and last value in each bin parallel
LTTBDownsampler performs the Largest Triangle Three Buckets algorithm parallel
MinMaxLTTBDownsampler (new two-step algorithm 🎉) first selects n_out * minmax_ratio min and max values, then further reduces these to n_out values using the Largest Triangle Three Buckets algorithm parallel, minmax_ratio*

*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332

Handling NaNs

This library supports two NaN-policies:

  1. Omit NaNs (NaNs are ignored during downsampling).
  2. Return index of first NaN once there is at least one present in the bin of the considered data.
Omit NaNs Return NaNs
MinMaxDownsampler NaNMinMaxDownsampler
M4Downsampler NaNM4Downsampler
MinMaxLTTBDownsampler NaNMinMaxLTTBDownsampler
LTTBDownsampler

Note that NaNs are not supported for x-data.

Limitations & assumptions 🚨

Assumes;

  1. x-data is (non-strictly) monotonic increasing (i.e., sorted)
  2. no NaNs in x-data

👤 Jeroen Van Der Donckt