lbehnke / hierarchical-clustering-java

Implementation of an agglomerative hierarchical clustering algorithm in Java. Different linkage approaches are supported.
141 stars 79 forks source link

Added a pdist-like approach to save memory #2

Closed asiviero closed 10 years ago

asiviero commented 10 years ago

Hey @lbhenke, thanks for this implemmentation. It worked nicely on small datasets, but it undergone heap issues with larger datasets. I'm not sure the entire problem could be solved by this commit, however it drops distances size from n*n to n*(n-1)/2, since distances is a symmetrical matrix whose main diagonal is irrelevant.

It incorporates the strategy used by MATLAB's pdist function, and I also kept it a one-lined matrix so I wouldn't need to change or add headers in the ClusteringAlgorithm interface.

lbehnke commented 10 years ago

Thanks @asiviero for your contribution!