Closed Carldeboer closed 8 years ago
Hi Carl,
It's difficult for me to get a sense of the problem without the data. But have you looked at the cell model fits when you provide the group labels versus when you omit the group labels? You can do this by setting save.model.plots = TRUE
.
When groups
are provided, error models are built using only cells in the same group, thereby only making pairwise comparisons between cells of that group to derive the expected gene expression magnitudes and drop-out rates. Otherwise, all cells are pooled together. So my best guess without seeing your data is that the derived error models are substantially different for some reason if you're seeing downstream differences in the differential expression analysis.
Also, have you tried looking at your Igj gene in more detail via: scde.test.gene.expression.difference("Igj", models = o.ifm, counts = cd, prior = o.prior)
for either error models?
Best, Jean
My data are here: http://www.broadinstitute.org/~cgdeboer/20151210_plasma_cell_DE_comparison_justReq.RData
You should be able to load that and run the code above directly.
I have not looked at the resulting error models. I agree that this is likely where the issue lies.
Hi Carl,
I've been able to replicate your problem and determine the cause.
The source of the problem was that the group labels you provided in the first part of your code is not in the same order as the error models. Note the difference in the 3 following ways of running scde.expression.difference
:
# groups = NULL so taken from error model
combinedDE <- scde.expression.difference(combinedErrModel, combinedOMsMat, combinedPrior, n.cores = 10, verbose = 1)
# groups in the wrong order
combinedDE2 <- scde.expression.difference(combinedErrModel, combinedOMsMat, combinedPrior, groups = combinedIsWeirdo, n.cores = 10, verbose = 1)
# groups in right order
combinedDE3 <- scde.expression.difference(combinedErrModel, combinedOMsMat, combinedPrior, groups = combinedIsWeirdo[rownames(combinedErrModel)], n.cores = 10, verbose = 1)
Note that the results for combinedDE
and combinedDE3
are quite similar to your combinedDENoG
as expected.
> combinedDE[c("Igj","Actb","Rplp0"),];
lb mle ub ce Z cZ
Igj -12.27471 -12.27471 -11.96784 -11.96784 -7.160847 -6.447212
Actb -12.27471 -12.27471 -11.75304 -11.75304 -7.160809 -6.447212
Rplp0 11.59960 12.27471 12.27471 11.59960 7.160813 6.447212
> combinedDE2[c("Igj","Actb","Rplp0"),];
lb mle ub ce Z cZ
Igj -1.22747122 0.000000 0.0000000 0.0000000 0.000000 0.0000000
Actb -6.38285035 -2.025328 -0.7364827 -0.7364827 -2.556040 -1.2923053
Rplp0 -0.03068678 0.000000 6.4135371 0.0000000 1.942183 0.7596449
> combinedDE3[c("Igj","Actb","Rplp0"),];
lb mle ub ce Z cZ
Igj -12.27471 -12.27471 -11.96784 -11.96784 -7.160847 -6.447212
Actb -12.27471 -12.27471 -11.75304 -11.75304 -7.160809 -6.447212
Rplp0 11.59960 12.27471 12.27471 11.59960 7.160813 6.447212
I will try to update the script to automatically orient the group labels in the same order as the error models when possible to prevent such errors in the future.
Best, Jean
Hi I noticed that when comparing cell types where one group does not express a gene at all (0 counts) and the other expresses it, this gene often fails to come up as differentially expressed in spite of its (often large) difference in expression. I noticed this when one of the most DE genes in my set was not coming up as significant (Igj), so I made this example where I blank out one half for each of two very highly expressed genes (Rplp0 and Actb). In all three cases, when the same grouping is used for the error models as the DE test, the significance is much smaller than when the grouping is not provided to the error model. In the case where the error model knows about the groups, these "binary" genes are not even near the top of the DE gene list. In fact, for Igj, it is not DE at all (FC=0, Z=0) despite it being highly expressed in one group (variably so) and 0 in the other).
Here is the output: