Open m-r-munroe opened 3 years ago
@m-r-munroe, it's interesting. What implementation of DTW do you (others) use? Some library?
pyts is decent.
https://github.com/johannfaouzi/pyts
sktime is also solid. https://github.com/alan-turing-institute/sktime
Along with random forests you should have done a grid-search using gradient boosted machines. Those things take 10x more trees in the ensemble, and they are interdependent so they take a lot longer to run. They used to be the winningest algorithm in kaggle.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html https://www.kaggle.com/msjgriffiths/r-what-algorithms-are-most-successful-on-kaggle/notebook https://bradleyboehmke.github.io/HOML/gbm.html
Describe the issue It is a miss in your benchmarking. I think you should look at the "time-series-bakeoff" for others like it. DTW is a pain, and very well known and well used method.
The speedup you can get there is going to speak to signal analysts in several areas.
References for DTW:
Citation:
Berndt, D. J., & Clifford, J. (1994, July). Using dynamic time warping to find patterns in time series. In KDD workshop (Vol. 10, No. 16, pp. 359-370).