This is a stand-alone version of RSAT matrix-clustering. This version is faster and simplified compared to the original RSAT matrix-clustering but the graphical output is still under development.
MIT License
8
stars
3
forks
source link
Threshold evaluation backwards for complete and single linkages? #5
Thank you for the very useful software. In funcion nodes.summary.stats in R/Tree_partition_utils.R the decision on whether to split cluster nodes is made. For this, a set of summary statistics for each node is calculated based on a summary function that is selected at the beginning of the file based on the selected linkage :
## Return the function that will be applied: mean (average linkage), min (single linkage),
## or max (complete linkage)
function.to.apply <- function(parameters = NULL){
switch(params.list$linkage_method,
average = mean,
single = min,
complete = max)
}
However, the statistics are calculated in terms of correlations and not distances. Therefore, for the "complete" linkage using the maximum correlation means the minimum distance is being used, and the other way around for the "single" linkage. I think this should be backwards. The practical consequence is that with "complete" linkage I'm seeing that the resulting clusters contain very obvious subclusters that are not split no matter how much I change the thresholds.
Therefore, I would propose changing these lines as follows:
## Return the function that will be applied: mean (average linkage), min (single linkage),
## or max (complete linkage)
function.to.apply <- function(parameters = NULL){
switch(params.list$linkage_method,
average = mean,
single = max,
complete = min)
}
Thank you for the very useful software. In funcion
nodes.summary.stats
inR/Tree_partition_utils.R
the decision on whether to split cluster nodes is made. For this, a set of summary statistics for each node is calculated based on a summary function that is selected at the beginning of the file based on the selected linkage :However, the statistics are calculated in terms of correlations and not distances. Therefore, for the "complete" linkage using the maximum correlation means the minimum distance is being used, and the other way around for the "single" linkage. I think this should be backwards. The practical consequence is that with "complete" linkage I'm seeing that the resulting clusters contain very obvious subclusters that are not split no matter how much I change the thresholds.
Therefore, I would propose changing these lines as follows: