predict-idlab / plotly-resampler

Visualize large time series data with plotly.py
https://predict-idlab.github.io/plotly-resampler/latest
MIT License
1k stars 67 forks source link

Difference between tsdownsample and resampler #247

Closed firmai closed 9 months ago

firmai commented 1 year ago

Just looking at the two projects https://github.com/predict-idlab/tsdownsample/ and https://github.com/predict-idlab/plotly-resampler/

The documentation doesn't make the following very clear, why would one not just downsample and create a graphic versus using the resampler, is the resampler doing it dynamically so that everytime you filter the dataset it provides the n preselected samples thus maintaining high fidelity all the way down?

Also if one doesn't select the number of samples, is there a good approximate default that kicks in for n selection? In the documentation it is hard for me to understand what the default behaviour is, how many n is being selected and what algo, is it LTTB?

And thanks for the software!

Best, Derek

jonasvdd commented 1 year ago

Hi @firmai,

I totally agree that we could make a larger effort to describe more of the inner workings of plotly-resampler (in the online docs) and our default data aggregation parameters.

However, I suggest, if time permits, to skim these papers:

Q: why would one not just downsample and create a graphic versus using the resampler, is the resampler doing it dynamically so that everytime you filter the dataset it provides the n preselected samples thus maintaining high fidelity all the way down?

A: It is exactly what you think it is! :) plotly-resampler uses user-graph-interaction callbacks to resampler (i.e. perform time series data aggregation) for the interacted regions. (see 📷 ⬇️ - from the plotly-resampler paper) image

Q: Also if one doesn't select the number of samples, is there a good approximate default that kicks in for n selection? In the documentation it is hard for me to understand what the default behaviour is, how many n is being selected and what algo, is it LTTB?

A: excellent question! We investigated these variables in this paper (https://arxiv.org/pdf/2304.00900.pdf). The default aggregator is MinMaxLTTB, you can think of it as a parallelizable variant of LTTB. The default number of selected samples for each aggregation $n{out}$ is set to 1000. However, an optimal $n{out}$ highly depends on (i) browser zoom level, (ii) graph canvas width, and (iii) line width. As determining these parameters (dynamically - as browser parameters can change over time) requires a lot of back-end and front-end logic, we have not (yet) put this on our roadmap to implement an automatic $n_{out}$ mode.

Additionally, one should also bear in mind that increasing $n_{out}$ will increase the network payload size and the front-end (re)rendering time, which may affect interactivity snappiness.

(see this README for more information on more accessible info w.r.t. visual representativeness: https://github.com/predict-idlab/ts-datapoint-selection-vis/blob/main/details/vis_representativity.md)

I hope this clarifies some stuff! And of course, thank you for taking a great interest in our research/software!

Kind regards, Jonas