lex8erna / UPGMApy

A basic implementation of the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering algorithm in Python.
11 stars 9 forks source link

update distance scheme #1

Open FeiYao-Edinburgh opened 4 years ago

FeiYao-Edinburgh commented 4 years ago

Hi there,

I have gone through both the UPGMA wiki and the Figure 2 of Dave Thomas. I feel that when updating the distance matrix, they are somewhat contradictory. For instance, in Figure 2 of Dave Thomas, when (B,F) and G are combined, dist(((B,F),G),E) should be (dist((B,F),E)*2 + dist(G,E)*1)/(2+1) = (dist(B,E)+dist(F,E)+dist(G,E))/3, if according to the UPGMA wiki. That would give value of 33 instead of 31.8=(35.5+28)/2. You might would like to check how your codes address this issue. Personally I feel the scheme provided by the UPGMA wiki is more appropriate according to its name unweighted, whereas the example in Dave Thomas is likely to be WPGMA.

matigekunstintelligentie commented 3 years ago

You are correct, just found out the same thing!

For the following tree UPGMA and WPGMA give different clusterings: 0 1 0 2 3 0 4 5 3.5 0 8 9 4.5 4.1