plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
162 stars 17 forks source link

scDblFinder - known doublets #70

Closed maryellenlynall closed 1 year ago

maryellenlynall commented 1 year ago

Hi, thanks for a great package. When I run scDblFinder on a single cell experiment object with arguments knowns= and knownsUse="discard", the output sce$scDblFinder.class calls some of the known doublets as singlets. The help for scDblFinder seems to state that with option "discard", the known doublets, while not used for training, should still be called as doublets, so I'm not sure why this is happening. I can of course just add those known doublets back in as doublets manually, but wondered if there was an issue with the scDblFinder code here?

plger commented 1 year ago

Hi, where in the documentation do you see this written? Perhaps the passage needs to be clarified, but with the 'discard' mode there is no enforcement that scDblFinder will call the known doublets as doublets. The simplest scenario is if the known doublets are homotypic (i.e. formed by two cells of the same type), and hence transcriptionally indistinguishable from singlets.

maryellenlynall commented 1 year ago

Hi, it's in the help when I do ?scDblFinder: "'discard' (they are discarded for the purpose of training, but counted as positive)" To me this implies the are called as positive. No problem if that isn't the behaviour, but it would help to clarify that sentence.

plger commented 1 year ago

Ok thanks, you're right that's indeed very misleading. They were counted for the purpose of calculating the threshold, but not assigned as doublets unless also predicted to be. I've now enforced that they are marked as doublets in scDblFinder.class, while leaving the scDblFinder.score untouched, so that one can still distinguish those that are predicted from those that wouldn't be. I think this will make sense for most users. I'll be pushing to Bioc devel once the checks have passed, and until then you can install from github.

maryellenlynall commented 1 year ago

Thanks for clarifying. I think it might be helpful to change the help documentation rather than the function's behaviour, so that people's existing scripts don't start working differently.

maryellenlynall commented 1 year ago

I think that's particularly true given the threshold is printed to consult isn't given as a default output in the output object, so it's difficult to determine which known doublets are predicted by scDblFinder simply from the doublet score without the threshold being output. I think the behaviour you had before was helpful, it's just the help that needed clarifying

plger commented 1 year ago

Changed back and updated doc