plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
153 stars 18 forks source link

Ambient RNA Removal #93

Closed meaksu closed 8 months ago

meaksu commented 8 months ago

Is it alright to run ambient RNA removal programs before running scDblFinder? I use SoupX during the loading of the sample, which adds decimals to the raw data counts. However, even after setting the parameter to round to the nearest integer, starting with 19000 cells I am getting around 2000 predicted doublets when using SoupX and around 3000 predicted doublets when not using it. Do you know what could be accounting for this large difference?

Thanks

plger commented 8 months ago

Hi,

This isn't so surprising: droplets with a high amount of contamination can easily look like doublets, because they contain RNA from other cell types. Normally this isn't an issue: whether a cell is a doublet or has a lot of contamination, either way you want to remove it, or at least 'clean' it. And obviously if the cells or the counts related to this exogenous RNA are removed, those won't be called as doublets anymore.

To be honest, I don't know what is optimal between decontamination first or doublets first. There is a possibility that such a decontamination package sees an actual doublet as contamination, and attempts to clean it. It will necessarily do so imperfectly (because while the decontamination is a mixture of all cells, a doublet isn't), but perhaps sufficiently so that it can't be accurately detected as a doublet anymore. This would therefore be an argument for running doublet calling first. However, it's also possible that decontamination, because it makes the cells cleaner, makes the doublet detection task easier. At the moment I don't have serious evidence to specifically recommend one of the two options.

meaksu commented 8 months ago

Thanks! One more question, is it safe to use SoupX in particular before using scDblFinder? Since it adds decimals to the raw data I don't know if it will mess up the algorithm.

plger commented 8 months ago

That shouldn't be a problem, it will simply be a little heavier on memory.

plger commented 8 months ago

discussed this in the vignette, now closing the issue, thanks for bringing it up