Open christopher-hardy opened 4 years ago
Hi! I also noticed differences between presto::wilcoxauc and Seurat::FindMarkers in the reported logFC and avg_log2FC respectively. I was wondering if you could clarify this a bit more, and whether one or the other calculation would be better?
Hey @ilyakorsunsky I think there is a -
where there should be a /
on this line:
See the example below for details.
Here is a complete example you can try running on your own:
Generate random data:
library(magrittr)
set.seed(42)
x1 <- rnorm(10) + 3
x2 <- rnorm(10) + 3
This is the fold-change:
mean(x1) / mean(x2)
#> [1] 1.25057
And the log fold-change:
log(mean(x1) / mean(x2))
#> [1] 0.2235997
OK so now let’s take a closer look at the presto::wilcoxauc()
function to see if we are getting the correct value for log fold-change:
X <- matrix(c(x1, x2), nrow = 1)
y <- factor(rep(c(1, 2), each = 10))
presto::wilcoxauc(X, y)
#> feature group avgExpr logFC statistic auc pval padj pct_in
#> 1 Feature1 1 3.547297 0.7107535 66 0.66 0.2413216 0.2413216 100
#> 2 Feature1 2 2.836543 -0.7107535 34 0.34 0.2413216 0.2413216 100
#> pct_out
#> 1 100
#> 2 100
OK, we wanted 0.22
but we got 0.71
, so maybe something is not right.
Let's go through each line of the presto::wilcoxauc()
function, one line at a time, and see where we might be going wrong:
### Auxiliary Statistics (AvgExpr, PctIn, LFC, etc)
group_sums <- presto::sumGroups(X, y, 1)
group_sums
#> [,1]
#> [1,] 35.47297
#> [2,] 28.36543
group_nnz <- presto::nnzeroGroups(X, y, 1)
group_nnz
#> [,1]
#> [1,] 10
#> [2,] 10
group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t()
group_pct
#> [,1] [,2]
#> [1,] 1 1
group_pct_out <- -group_nnz %>%
sweep(2, colSums(group_nnz) , "+") %>%
sweep(1, as.numeric(length(y) - table(y)), "/") %>% t()
group_pct_out
#> [,1] [,2]
#> [1,] 1 1
group_means <- sweep(group_sums, 1, as.numeric(table(y)), "/") %>% t()
group_means
#> [,1] [,2]
#> [1,] 3.547297 2.836543
cs <- colSums(group_sums)
cs
#> [1] 63.8384
gs <- as.numeric(table(y))
gs
#> [1] 10 10
So far so good.
But this is giving us 0.71
instead of 0.22
:
lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))
}))
lfc
#> init
#> [1,] 0.7107535 -0.7107535
After we change -
to /
and add a %>% log()
at the end, we get the correct numbers:
lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
group_means[, g] / ((cs - group_sums[g, ]) / (length(y) - gs[g]))
})) %>% log()
lfc
#> init
#> [1,] 0.2235997 -0.2235997
I also am getting different results from FindAllMarkers and presto's function (presto:::wilcoxauc.Seurat). I am running presto as shown below:
markers_rna <- presto:::wilcoxauc.Seurat(X = seur_obj,
group_by = 'wsnn_res.0.8', assay = 'data',
seurat_assay = 'SCT')
Hey @ilyakorsunsky I think there is a
-
where there should be a/
on this line:See the example below for details.
Example
Here is a complete example you can try running on your own:
Generate random data:
library(magrittr) set.seed(42) x1 <- rnorm(10) + 3 x2 <- rnorm(10) + 3
This is the fold-change:
mean(x1) / mean(x2) #> [1] 1.25057
And the log fold-change:
log(mean(x1) / mean(x2)) #> [1] 0.2235997
OK so now let’s take a closer look at the
presto::wilcoxauc()
function to see if we are getting the correct value for log fold-change:X <- matrix(c(x1, x2), nrow = 1) y <- factor(rep(c(1, 2), each = 10)) presto::wilcoxauc(X, y) #> feature group avgExpr logFC statistic auc pval padj pct_in #> 1 Feature1 1 3.547297 0.7107535 66 0.66 0.2413216 0.2413216 100 #> 2 Feature1 2 2.836543 -0.7107535 34 0.34 0.2413216 0.2413216 100 #> pct_out #> 1 100 #> 2 100
OK, we wanted
0.22
but we got0.71
, so maybe something is not right.Let's go through each line of the
presto::wilcoxauc()
function, one line at a time, and see where we might be going wrong:### Auxiliary Statistics (AvgExpr, PctIn, LFC, etc) group_sums <- presto::sumGroups(X, y, 1) group_sums #> [,1] #> [1,] 35.47297 #> [2,] 28.36543 group_nnz <- presto::nnzeroGroups(X, y, 1) group_nnz #> [,1] #> [1,] 10 #> [2,] 10 group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t() group_pct #> [,1] [,2] #> [1,] 1 1 group_pct_out <- -group_nnz %>% sweep(2, colSums(group_nnz) , "+") %>% sweep(1, as.numeric(length(y) - table(y)), "/") %>% t() group_pct_out #> [,1] [,2] #> [1,] 1 1 group_means <- sweep(group_sums, 1, as.numeric(table(y)), "/") %>% t() group_means #> [,1] [,2] #> [1,] 3.547297 2.836543 cs <- colSums(group_sums) cs #> [1] 63.8384 gs <- as.numeric(table(y)) gs #> [1] 10 10
So far so good.
But this is giving us
0.71
instead of0.22
:lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) { group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g])) })) lfc #> init #> [1,] 0.7107535 -0.7107535
After we change
-
to/
and add a%>% log()
at the end, we get the correct numbers:lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) { group_means[, g] / ((cs - group_sums[g, ]) / (length(y) - gs[g])) })) %>% log() lfc #> init #> [1,] 0.2235997 -0.2235997
I tried this modification, in addition to that, I added 0.1 to the count matrix to avoid Inf. Then the returned logFC seems to be consistent with the default logFC. So I guess the purpose of original logFC was to avoid Inf?
Hi,
Thank you for your work- it is really remarkable.
Not sure if this is a bug or a feature, but when doing a comparison between presto::wilcoxauc and Seurat::FindMarkers I noticed a difference in the LogFC calculation. When calculating the LogFC from a Seurat Data Object, the values are untransformed prior to calculating the mean, then re-log-transformed on the mean values- plus a pseudo count of default 1 to avoid Inf LogFCs (Line 551 @ https://github.com/satijalab/seurat/blob/master/R/differential_expression.R). This difference causes changes in the avgExpr and logFC when compared to the Seurat output. I was wondering if you could clarify whether you believe your approach is correct, or if this a potential issue with an edge case when the data are previously log transformed?
Note: If helpful I am able to replicate the values in Seurat with a couple slight modifications (see lines 1, 7 and 11 below).
Thanks!
Chris