Closed VedAustin closed 7 years ago
Oh, I agree, I really should make this more clear (noting it down for a potential 2nd addition).
I think you are referring to this one, right?
We can read this as follows ...
Let's say we have n samples with indices i in {0, ..., n} where n=5 in this case.
Sorry for being overly complicated with the "letters" here, but it helps with generalizing explanation, I hope :). If we have n samples, we will eventually get n-1 clusters. So, in this case our linkage matrix has 5-1=4 rows: The clusters.
Now, the numbers 0-7 are the indices of the samples/clusters being merged. The numbers 0-4 are the indices of singleton clusters (our initial n sample indices). The indices 5-7 are non-singleton clusters that were created upon merging. Maybe, let's walk through it step by step:
I hope this doesn't sound too complicated and makes sense? Let me know :)
Yep .. thank you very much make sense! A quick question: when you are using 'complete' linkage, the most dissimilar items have the lowest distance in the distance matrix? Also when you create cluster with new index, i=5, how did you decide to merge with i=3? In other words what is the output of creating a cluster?
Glad to hear that it helped!
A quick question: when you are using 'complete' linkage, the most dissimilar items have the lowest distance in the distance matrix?
Yes :).
Also when you create cluster with new index, i=5, how did you decide to merge with i=3?
You basically follow the three steps above. Now, note that a cluster can consist of 1 single sample (at the very beginning, or in other words, you have n clusters, where n is the number of samples in your dataset.
Great! Thank you Sebastian! All clear now.
I understood the concept of complete linkage .. however in the example you provided I did not understand the values in the table with columns 'row label 1', 'row label 2' etc ..