Closed tommedema closed 1 year ago
k = None
is indeed faster than k = len(series)-1
, but this is intended as the assumption is that k is small. When using k, one needs to keep track of the k best instances, which incurs some overhead. This is more expensive in this case. When using larger series, the cost of DTW should start dominating and saving on DTW computations should make k!=None
faster again. For small values for k, it should also be faster.
Thus, in case you know you will need large values of k and the series are short, it's probably better to set k to None and filter afterwards.Great, thanks @wannesm
Any idea on what caused the performance hit since 26c3962? If it is still 2x faster than on master I may stay on that commit for a while. :)
When adding more unit tests for subsequence search I found another silly bug which caused all values to be saved twice. The difference between None
and len(series)-1
is now almost zero.
The original regression since 26c3962 was mainly because the final sort operation was erroneously executed after each distance calculation. I was trying to be too fast, sorry for confusion.
Nice one! 🙌
On Tue, Nov 1, 2022 at 02:50 wannesm @.***> wrote:
When adding more unit tests for subsequence search I found another silly bug which caused all values to be saved twice. The difference between None and len(series)-1 is now almost zero.
— Reply to this email directly, view it on GitHub https://github.com/wannesm/dtaidistance/issues/185#issuecomment-1298284316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACRAOPNX4RMNLMW4EEMVX3WGDRVBANCNFSM6AAAAAARMAKZIA . You are receiving this because you modified the open/close state.Message ID: @.***>
Recently I was surprised that my scripts seem to be running significantly slower (very noticeable), and so I did some digging.
Consider this standalone test case:
When running this on commit 26c3962 with
k = None
:711 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
When running this on commit 26c3962 with
k = len(series)-1
(note that there seems to be a bug where settingk = len(series)
results in anOutOfBoundError
:1.61 ms ± 4.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
When running this on master with
k = len(series)-1
(I cannot test None here due to issue #184)8.79 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This is 5.5x slower than on 26c3962.
There seem to be 3 issues here:
k = None
is much faster thank = len(series)-1
, which seems counterintuitivek = len(series)
results in anOutOfBoundError
(k = len(series)-1
works)I wonder if some of the attempts to lower RAM usage at #183 have caused these issues. Of course, speed is more important than RAM usage for most use cases. Especially if I can use float32.
For now I am going to revert to 26c3962 to avoid issue #185 and #184