benchmark miss: dynamic time warping (dtw)

intel / scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

https://intel.github.io/scikit-learn-intelex/

Apache License 2.0

1.23k stars 175 forks source link

benchmark miss: dynamic time warping (dtw) #751

Open m-r-munroe opened 3 years ago

m-r-munroe commented 3 years ago

Describe the issue It is a miss in your benchmarking. I think you should look at the "time-series-bakeoff" for others like it. DTW is a pain, and very well known and well used method.

The speedup you can get there is going to speak to signal analysts in several areas.

References for DTW:

http://www.cs.ucr.edu/~eamonn/DTW_myths.pdf <-- look at how DTW is described by Anthony Bagnall here
https://arxiv.org/abs/1602.01711

Citation:
Berndt, D. J., & Clifford, J. (1994, July). Using dynamic time warping to find patterns in time series. In KDD workshop (Vol. 10, No. 16, pp. 359-370).

SmirnovEgorRu commented 3 years ago

@m-r-munroe, it's interesting. What implementation of DTW do you (others) use? Some library?

m-r-munroe commented 3 years ago

pyts is decent.
https://github.com/johannfaouzi/pyts

sktime is also solid. https://github.com/alan-turing-institute/sktime

m-r-munroe commented 3 years ago

Along with random forests you should have done a grid-search using gradient boosted machines. Those things take 10x more trees in the ensemble, and they are interdependent so they take a lot longer to run. They used to be the winningest algorithm in kaggle.

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html https://www.kaggle.com/msjgriffiths/r-what-algorithms-are-most-successful-on-kaggle/notebook https://bradleyboehmke.github.io/HOML/gbm.html