tsdownsample

Extremely fast time series downsampling 📈 for visualization, written in Rust.

Features ✨

Fast: written in rust with PyO3 bindings
- leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
- scales linearly with the number of data points
- multithreaded with Rayon (in Rust)
  
  Why we do not use Python multiprocessing
  Citing the PyO3 docs on parallelism:
  
  CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.
  In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.
Efficient: memory efficient
- works on views of the data (no copies)
- no intermediate data structures are created
Flexible: works on any type of data
- supported datatypes are
- for x: f32, f64, i16, i32, i64, u16, u32, u64, datetime64, timedelta64
- for y: f16, f32, f64, i8, i16, i32, i64, u8, u16, u32, u64, datetime64, timedelta64, bool
  
  !! 🚀 f16 argminmax is 200-300x faster than numpy
  In contrast with all other data types above, f16 is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!!
  🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation.
  💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping from f16 to i16 is sufficient. This mapping allows to use the hardware supported scalar and SIMD i16 instructions - while not producing any memory overhead 🎉
  More details are described in argminmax PR #1.
Easy to use: simple & flexible API

Install

pip install tsdownsample

Usage

from tsdownsample import MinMaxLTTBDownsampler
import numpy as np

# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))

# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)

# Select downsampled data
downsampled_y = y[s_ds]

# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)

# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]

Downsampling algorithms & API

Downsampling API 📑

Each downsampling algorithm is implemented as a class that implements a downsample method. The signature of the downsample method:

downsample([x], y, n_out, **kwargs) -> ndarray[uint64]

Arguments:

x is optional
x and y are both positional arguments
n_out is a mandatory keyword argument that defines the number of output values^*
**kwargs are optional keyword arguments (see table below):
- parallel: whether to use multi-threading (default: False)
  ❗ The max number of threads can be configured with the TSDOWNSAMPLE_MAX_THREADS ENV var (e.g. os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4")
- ...

Returns: a ndarray[uint64] of indices that can be used to index the original data.

^*When there are gaps in the time series, fewer than n_out indices may be returned.

Downsampling algorithms 📈

The following downsampling algorithms (classes) are implemented:

Downsampler	Description	`**kwargs`
`MinMaxDownsampler`	selects the min and max value in each bin	`parallel`
`M4Downsampler`	selects the min, max, first and last value in each bin	`parallel`
`LTTBDownsampler`	performs the Largest Triangle Three Buckets algorithm	`parallel`
`MinMaxLTTBDownsampler`	(new two-step algorithm 🎉) first selects `n_out` * `minmax_ratio` min and max values, then further reduces these to `n_out` values using the Largest Triangle Three Buckets algorithm	`parallel`, `minmax_ratio`^*

^*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332

Handling NaNs

This library supports two NaN-policies:

Omit NaNs (NaNs are ignored during downsampling).
Return index of first NaN once there is at least one present in the bin of the considered data.

Omit `NaN`s	Return `NaN`s
`MinMaxDownsampler`	`NaNMinMaxDownsampler`
`M4Downsampler`	`NaNM4Downsampler`
`MinMaxLTTBDownsampler`	`NaNMinMaxLTTBDownsampler`
`LTTBDownsampler`

Note that NaNs are not supported for x-data.

Limitations & assumptions 🚨

Assumes;

x-data is (non-strictly) monotonic increasing (i.e., sorted)
no NaNs in x-data

👤 Jeroen Van Der Donckt

predict-idlab / tsdownsample

readme