plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
162 stars 17 forks source link

Removing genes from count matrix before doublet detection #104

Closed kiwipeel closed 1 month ago

kiwipeel commented 7 months ago

Hi,

I want to remove some rRNA contamination-related genes from my scRNA count matrix. Should I use scdblfinder after or before removing those genes? I know that cell filtering isn't recommended before using scdblfinder, but I couldn't find any information about gene filtering.

plger commented 7 months ago

Hi, What do you mean exactly by "rRNA contamination-related genes"? Whether it's better to do it before or after chiefly depends on what proportion of the variability in those genes you expect to be between cell types, as opposed to within cell-type. But scDblFinder should be fairly robust to removing or not some features... Pierre-Luc

kiwipeel commented 5 months ago

@plger

For instance, gm42418 is an lncRNA gene associated with rRNA contamination. When I looked at the UMAP plot, I noticed that some cell clusters express this gene highly. In this case, wouldn't it be more appropriate not to remove this gene before doublet detection, since there seems to be variation between cell types?

plger commented 5 months ago

Ok, I can't say what's best. The fact that it varies across your clusters, unless you think the contamination drives the clustering (i.e. if there aren't big differences except for rRNA-associated genes), points towards cell-type differences that could be useful in detecting doublets. If instead there are reasons to think that it's chiefly technical, it's probably better without it. But again, I'd assume the difference is relatively minor in the end...