Open hoxbro opened 1 year ago
Hi! I'm glad to hear that you like the speed of tsdownsample :rocket: (I even think there might stil bel some more room for improving the runtime speed of tsdownsample)
Thank you for pointing out the discrepancy in the results between the LTTB implementation in tsdownsample and plotly_resampler. I apologize for the lack of documentation on this point.
To clarify, the MinMaxLTTB algorithm is a new heuristic for the LTTB algorithm, which first performs MinMax downsampling on the data before running the more expensive LTTB algorithm on the reduced data. The MinMaxLTTB downsampler takes an optional keyword argument called minmax_ratio
, which defines the ratio of the number of data points that the array will be downsampled to in the first step (i.e., n_out * minmax_ratio). The default value for minmax_ratio
is 30, which has been emperically proven to be a good default value (more on this in our upcoming paper).
Given that you have n_out
set to 1_000, you would expect to see some runtime overhead at 30_000 (since for shorter arrays, the MinMax downsampling is omitted).
As an illustration, here is some modified code where minmax_ratio
is set to 60:
import numpy as np
from plotly_resampler.aggregation.algorithms.lttb_py import LTTB_core_py
from tsdownsample import MinMaxLTTBDownsampler
np.random.seed(1)
ns = [i for i in np.arange(1, 100, 2) * 1000]
n_out = 1000
for n in ns:
x, y = np.arange(n), np.random.randn(n).cumsum()
py = LTTB_core_py.downsample(x, y, n_out)
ru = MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out, minmax_ratio=60)
print(f"Matches with {n:>5}: {(py == ru).sum() / 10}%")
The percentages above show the agreement of MinMaxLTTB with LTTB (which is also a local heuristic).
I hope this helps to clarify the behavior you observed. I'm happy to hear what you think of the MinMaxLTTBDownsampler :eyes:
P.S.: To use the parallel version of the tsdownsample algorithms (you should pass the parallel
=True to the downsample method) - once again my apologies for the lacking documentation.
@Hoxbro Would it be possible to share your benchmark code? @jvdd I have tested your implementation against a C based implementation as well as my personal numba based implementation of the native LTTB algorithm. So far I have not noticed a general performance superiority (I am not using the parallel version). In fact your implementation only seems to be faster for a very large number of samples (>10^7) and a rather "small" number of buckets (<10000).
Oddly I have also observed sub-linear scaling of your implementation in case of >10^6 data points: a 10x increase in the number of data points lead to only a 3x increase in time. Whereas my implementation scales linearly.
Hi @muendlein,
I really appreciate other people taking the time to benchmark the code! :blush:
I have created an issue concerning the benchmarking: #21
When reporting benchmark results that compare with other implementations, it is extremely important to make sure that we are comparing apples to apples;
x
?
x
(otherwise there is no point in providing the x
to M4 or MinMax downsampler)y
(and x
)n_out
?Perhaps you (@muendlein) could also share your benchmarks (with your C & numba implementation) as well?
Regarding your observations:
I believe this library shines for large arrays (millions to billions of datapoints) with an n_out
that is not too large. (Imo having a large n_out
does not make a lot of sense as you are downsampling for visualizing the data - which is always limited by the pixelwidth of your screen).
My takeaways from your remarks
n_out
results in more overhead => (already observed this when using x
& improved this for the parallel implementation, #15)P.S. you are always welcome to contribute further to this library, pinpointing where the code is slower than other implementations is already a significant contribution :rocket:
(P.P.S. I observed the sublinear scaling as well, but need more time confirm this with proper benchmarks)
I have been playing around with your implementation of lttb and trying to compare it to other implementations (numpy and numba) and have been impressed by the speed.
But I have noticed that the algorithm does not give the same results as your other implementation in
plotly_resampler
. It seems to fail around 30,000 points. Do you know if this is expected?Looking at the above log-log plot, some extra overhead is added, around 30,000. So I'm thinking it could be because of some race condition when parallelizing the algorithm.