lbehnke / hierarchical-clustering-java

Implementation of an agglomerative hierarchical clustering algorithm in Java. Different linkage approaches are supported.
141 stars 79 forks source link

Bug Report about the java file AverageLinkageStrategy #20

Closed dlee992 closed 2 years ago

dlee992 commented 8 years ago

I think that I found a bug in your implementation when using the average linkage strategy. You add a wiki link in the README owing to illustrating the average linkage strategy. However, you didn't follow the correct method to calculate the distance between two joined clusters.

You should use the formula d(AUB),X (refer to the wiki page)(https://en.wikipedia.org/wiki/UPGMA), but you use a too simple way to fetch d(AUB),X=(dA,X + dB,X)/2.

In fact, your implementation even couldn't correctly run the example in the wiki page. You need to check this, in case misleading other people.

Looking forward to your reply.

lbehnke commented 8 years ago

Thanks for your feedback. At first sight I think your are right when say the current approach is too simple. I'll take a thorough look at it ASAP. Of course, you are more than welcome to contribute a fix or provide a unit test. In the meantime I'll remove the misleading wikipedia links.