titu1994 / dtw-numba

Implementation of Dynamic Time Warping algorithm with speed improvements based on Numba.
MIT License
16 stars 1 forks source link

Does this support Numba GPU utilization? #2

Open skoeb opened 5 years ago

skoeb commented 5 years ago

Numba supports the utilization of CUDA GPUs, does this package have any capability to support these?

titu1994 commented 5 years ago

No it doesn't.

The reason for this is DTW is almost equivalent to a dynamic programming problem in the simplest brute force setting, and CUDA kernels require independence in their operations.

There are papers that parallelize DTW using prefix notations to compute approximate forms of parallel Cuda optimized DTW, which is exceptionally fast, but my objective was to have an exact DTW implementation that is almost at C levels of speed post-compilation and can be parallelized in multiprocess environment to speed up batch processing.

Technically, with it's current optimizations, we were able to run exact DTW on Starlight Curves dataset (1000 train samples, 8236 test samples, 1024 timesteps univariate dataset from UCR) in around an hour. This entails computing a 1000x8236 distance matrix with 1024x1024 comparisons for each of those 8 million comparisons using the DTW algorithm, which is pretty fast on a 16 core machine.