jaimicore / matrix-clustering_stand-alone

This is a stand-alone version of RSAT matrix-clustering. This version is faster and simplified compared to the original RSAT matrix-clustering but the graphical output is still under development.
MIT License
8 stars 3 forks source link

Threshold evaluation backwards for complete and single linkages? #5

Closed castillohair closed 8 months ago

castillohair commented 8 months ago

Thank you for the very useful software. In funcion nodes.summary.stats in R/Tree_partition_utils.R the decision on whether to split cluster nodes is made. For this, a set of summary statistics for each node is calculated based on a summary function that is selected at the beginning of the file based on the selected linkage :

## Return the function that will be applied: mean (average linkage), min (single linkage),
## or max (complete linkage)
function.to.apply <- function(parameters = NULL){

  switch(params.list$linkage_method,
         average  = mean,
         single   = min,
         complete = max)
}

However, the statistics are calculated in terms of correlations and not distances. Therefore, for the "complete" linkage using the maximum correlation means the minimum distance is being used, and the other way around for the "single" linkage. I think this should be backwards. The practical consequence is that with "complete" linkage I'm seeing that the resulting clusters contain very obvious subclusters that are not split no matter how much I change the thresholds.

Therefore, I would propose changing these lines as follows:

## Return the function that will be applied: mean (average linkage), min (single linkage),
## or max (complete linkage)
function.to.apply <- function(parameters = NULL){

  switch(params.list$linkage_method,
         average  = mean,
         single   = max,
         complete = min)
}
jaimicore commented 8 months ago

Hi,

thanks for the correction.

This change is now added in the commit f20166a.