I profiled the code and found that
99% of the time was being spent in
data.remove(remove);
in DistanceMap.remove(ClusterPair link)
If we simply comment that line out, and not worry about removing items from the priority queue while the agglomeration is happening, things will run much faster. I don't think its that urgent to do the remove because the entire DistanceMap will be reclaimed once the agglomeration is done.
There are still memory issues for large N, but for much larger N than without this change.
Clustering of 2048 nodes took over 16 minutes before this change, but only 10 seconds after.
I profiled the code and found that 99% of the time was being spent in
data.remove(remove);
in DistanceMap.remove(ClusterPair link) If we simply comment that line out, and not worry about removing items from the priority queue while the agglomeration is happening, things will run much faster. I don't think its that urgent to do the remove because the entire DistanceMap will be reclaimed once the agglomeration is done. There are still memory issues for large N, but for much larger N than without this change. Clustering of 2048 nodes took over 16 minutes before this change, but only 10 seconds after.