time-series-machine-learning / tsml-java

Java time series machine learning tools in a Weka compatible toolkit
GNU General Public License v3.0
159 stars 120 forks source link

DTW Variants #174

Open TonyBagnall opened 5 years ago

TonyBagnall commented 5 years ago

we currently have the following different DTW 1NN variants in classifiers.distance_based 1) FastDTW: Cheng's SDM paper version, extends EnhancedAbstractClassifier 2) DTW_kNN: very old version that extends my kNN classifier, which adapts IiBK to make setting distance function easier. Not an EAC (EnhancedAbstractClassifier) 3) SlowDTW_1NN: standard implementation that EAC and does full window search 4) FastDTW_1NN: my faster window search with a few optimizations

I propose depreciate (2), its no longer needed and remove (1) as it is both broken and redundant, given FastEE now has all the Fast variants, but keep the other two for basic benchmarking, possibly refactoring to avoid confusion. Any thoughts on any of this?

goastler commented 5 years ago

I've made a base KNN which handles distance measures and contracting. Currently under the TrainAccEstimate interface design, I'm yet to switch to EAC. The KNN provides _KNN easily though, so it can replace any slow implementations, i.e. 2 and 3 I guess.

TonyBagnall commented 5 years ago

I might leave 3) but rename it, it is the simplest implementation of DTW and quite good for newbies to see

jasonlines commented 5 years ago

Yeah the DTW/KNN situation is a mess and has always been a source of confusion for me!

To get on top of it, I propose that we have 1 base KNN classifier which supports different distance measures, internal param setting, etc. in the distance based classification heirarchy (this should be the one that George has made for EE for consistency as it will have a simple, sktime-like interface).

To avoid the ambiguity and confusion, I'd propose having measure-specific enhancements/heuristics in a different package (perhaps like a distance-based contrib) as there's definitely a place for implementations like 4), but I think it's confusing if you have to use different base classifiers for different distance measures. That way we can create a logical separation between the core functionality and enhanced stuff (and maybe in time have a wiki page to explain the difference!)

Any thoughts?