jokergoo / ComplexHeatmap-reference

Documentation of ComplexHeatmap package
http://jokergoo.github.io/ComplexHeatmap-reference/book
Creative Commons Zero v1.0 Universal
22 stars 24 forks source link

`get_correlated_variable_rows` in chapter 14.3 has dead code #10

Open krassowski opened 2 years ago

krassowski commented 2 years ago

In https://jokergoo.github.io/ComplexHeatmap-reference/book/more-examples.html#visualize-cell-heterogeneity-from-single-cell-rnaseq in the function get_correlated_variable_genes:

get_correlated_variable_genes = function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0) {
    ind = order(apply(mat, 1, function(x) {
            q = quantile(x, c(0.1, 0.9))
            x = x[x < q[1] & x > q[2]]
            var(x)/mean(x)
        }), decreasing = TRUE)[1:n]
    mat2 = mat[ind, , drop = FALSE]
    dt = cor(t(mat2), method = "spearman")
    diag(dt) = 0
    dt[abs(dt) < cor_cutoff] = 0
    dt[dt < 0] = -1
    dt[dt > 0] = 1

    i = colSums(abs(dt)) > n_cutoff

    mat3 = mat2[i, ,drop = FALSE]
    return(mat3)
}

The fragment that filters by quantile does not seem to do anything. q[1] is the lower quantile, q[2] is the upper quantile of x, thus x < q[1] & x > q[2]] is always FALSE. This means that x gets length of 0, and the entire expression:

apply(mat, 1, function(x) {
  q = quantile(x, c(0.1, 0.9))
  x = x[x < q[1] & x > q[2]]
  var(x)/mean(x)
})

returns a named vector with "NA" only. In consequence ind is equivalent to 1:n.

What was meant probably was x = x[x > q[1] & x < q[2]], but if the example works without this step, it could just be removed.

If we correct the direction to align with the likely intent:

Before (note: 721 genes) After correcting directions (note: 693 genes)
image image

After removing the filtering code:

get_correlated_variable_genes = function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0) {
    dt = cor(t(mat), method = "spearman")
    diag(dt) = 0
    dt[abs(dt) < cor_cutoff] = 0
    dt[dt < 0] = -1
    dt[dt > 0] = 1

    i = colSums(abs(dt)) > n_cutoff

    mat3 = mat[i, ,drop = FALSE]
    return(mat3)
}

image

jokergoo commented 2 years ago

You are right. That was a mistake. It is interesting that order() on a vector of all NA values still returns a meaningful ordering.

> order(c(NA, NA, NA))
[1] 1 2 3

I will adjust the code.

Thanks!