swolock / scrublet

Detect doublets in single-cell RNA-seq data
MIT License
131 stars 73 forks source link

potential cross validation of predicted doublets #15

Open MichaelPeibo opened 4 years ago

MichaelPeibo commented 4 years ago

Hi, @swolock

Thanks for developing this package.

I wonder, is there any way to cross-validate the predicted doublets? In that way, it will give us a sense whether it is proper to define those cells are doublets or not?

swolock commented 4 years ago

Hi @MichaelPeibo,

Except for experimental methods for doublet detection (e.g. cell hashing), I don't know of any truly independent ways to validate the predicted doublets. That said, if you know good marker genes for the various cell types in your data set, we expect the Scrublet-predicted doublets to co-express marker genes of multiple cell types (for examples, see Figs 4G and 6D in the Scrublet paper). Furthermore, predicted doublets should, on average, have more total molecules (UMIs) detected than singlets, but this isn't always true (e.g., if one cell type has much less mRNA than others).

You could also try running other computational doublet prediction tools (DoubletFinder, DoubletDecon, DoubletDetection, and others that have appeared more recently). The ones I listed are mostly quite similar to Scrublet under the hood (less true for DoubletDecon), but they may give you additional confidence in the doublet predictions.

MichaelPeibo commented 4 years ago

Hi @swolock Thanks for reply. I checked counts of predicted doublet, shown below: image

I tried also DoubletFinder, while, for same dataset, common doublet cells is few. And chosen pK in DoubletFinder seems to give rise to a higher rate of doublet.

I found in Fig4G, there are Marker Gene Co-Expression Score and Hybrid Doublet Score, would you mind adding reproducible code in the github?