wannesm / dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)
Other
1.09k stars 184 forks source link

using calculated distance matrix to draw dendrogram #166

Closed Ne-oL closed 2 years ago

Ne-oL commented 2 years ago

hi, so I have been looking for a while but I couldn't find a way to create linkage matrix and dendrogram based on an already calculated distance matrix. I used the scipy package to calculate the linkage and draw the dendrogram image but its no where near as nice or informative as the one produced by dtaidistance library, especially the length since its based on variable length TS dataset image I'm using a custom function to calculate the distance matrix, the function contains a nested loop to fill the distance matrix cell by cell. its parallelized so it takes a few hours at most, so I'm looking for one of two:

  1. is there a possibility to use an already calculated matrix to draw the dendrogram.
  2. is there a possibility to customize (as in adding if statements etc based on the calculated pairs) dtw.distance_matrix_fast() with TS that have variable lengths.
wannesm commented 2 years ago

You should be able inherit from BaseTree https://github.com/wannesm/dtaidistance/blob/6948940f138994cdf9b56eb587d9bca26ab67e85/dtaidistance/clustering/hierarchical.py#L139

By filling in the this.series and this.linkage variables you can plot the graphs. The series variable is simply a list (or matrix) of time series. The linkage variable is following the Scipy linkage datastructure. You can thus easily generate the linkage datastructure from a distance matrix using the following function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html

You can also look at LinkageTree which is following a similar approach.

Ne-oL commented 2 years ago

would it be possible to surface this option in the future as a function attached to the plot() function, where you would provide the timseseries dataset and the linkage matrix and it would produce the plot?

Ne-oL commented 2 years ago

This an update for anyone who comes here in the future, @wannesm suggestion below:

You should be able inherit from BaseTree

https://github.com/wannesm/dtaidistance/blob/6948940f138994cdf9b56eb587d9bca26ab67e85/dtaidistance/clustering/hierarchical.py#L139

By filling in the this.series and this.linkage variables you can plot the graphs. The series variable is simply a list (or matrix) of time series. The linkage variable is following the Scipy linkage datastructure. You can thus easily generate the linkage datastructure from a distance matrix using the following function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html

You can also look at LinkageTree which is following a similar approach.

worked, but there was an issue, it would fail due to this error: AttributeError: 'list' object has no attribute 'get_max_min_y'

after trying around, I found the issue was due to the timeseries, as inheriting the class meant that the timeseries list was used as it is, while in the original one, it seems that it was converted to SeriesContainer object, after converting the timeseries to SeriesContainer object, the dendrogram was plotted with no other issues. here is a snippet to it for future onlookers:

class dendoTree(clustering.hierarchical.BaseTree):
    def __init__(self, **kwargs):
        self.linkage = linkage_matrix 
        self.series = util.SeriesContainer(timeseries)

model1=dendoTree()
model1.plot()

hope it helps.