Open candalfigomoro opened 4 years ago
I agree that DTW would be a useful metric to have. It is not trivial to implement; Kyle had a good solution that works around the issues with getting a pure numba implementation working. It is defintiely on my TODO list, but admittedly is currently not that high. If you feel like you could write a decent DTW routine that is numba compilable I would be very happy to accept a PR.
@lmcinnes Do you see any particular issue in kyle's implementation (apart from the comment about the dtw function naming)?
An "issue" I can see is that DTW could be used with multiple distances (not just euclidean distance), so it's like DTW should be a sort of wrapper for other distances.
I believe this is something you can work around with some numba fanciness. In particular it is possible for distance functions to take (optional) parameters. Since numba supports passing numba jitted functions as parameters you could, in principle, pass in the underlying distance function as an optional parameter.
An added bonus may be looking at using numba stencil operators for some of the cumulative distance computations.
https://github.com/lmcinnes/umap/blob/b17f0ab04e1b455e179ff14087a4a6f584fe1966/umap/layouts.py#L199
This is how you could dynamically wrap functions to be numba compatible.
I think Dynamic Time Warping (DTW) distance would be a nice addition when we deal with time series.
While searching for UMAP + DTW on Google I found this implementation: https://gist.github.com/kylemcdonald/76b6f18fb4026e01196282b59bd31e7e
I didn't try it (I don't know if it correctly works), but I think it could be useful to add DTW to built-in distances in UMAP.