Closed paulbrodersen closed 4 years ago
Hi @paulbrodersen, thanks for the kind words and for your interest in improving the automated doublet calling (and my apologies for the severely delayed response – life has moved on a bit, but I would like to maintain and potentially improve Scrublet as long as it's useful).
scikit-image
and didn't have better luck with any of them. I also briefly thought about trying to incorporate the expected doublet rate more directly into the threshold setting (in theory this is an upper bound on the detected doublet fraction) but never implemented anything. If you have other ideas, I would be excited to chat about them and possibly help with trying them out.Hi @swolock
No apologies needed. I also struggle to maintain all the code that I have released into the wild.
Re 1: I have another data set coming up shortly. I will revisit the doublet threshold calling then and if I come up with something, I will let you know. Might be a little while though.
Re 2: Thanks for the pointer to demuxlet. I had not come across that paper, yet (I am still pretty new to RNASeq).
I will close the issue for now to keep your issue tracker clean, and reference it if I make a PR.
Hi @swolock beautiful package, everything just runs out of the box, the paper and code are very readable, and crucially, the doublet scores look pretty good (samples with high scores cluster in distinct regions of the UMAP manifold). However, your automated way of setting the threshold based on the scores of simulated samples seems to be fairly permissive, in my own data and in others. Furthermore, in the paper (at least the arxiv preprint), you seem to choose the threshold by eye yourself.
Personally, it looks to me like using scikit image's
threshold_minimum
isn't doing you any favours, and I wonder if there are other ways that might be better. Before I start trying a bunch of stuff, I wonderedif you would be willing to share what approaches you have tried so far, and/or
if you had any test data sets in a readily available format that you find particularly useful for testing any other approach. In particular, are there any data sets for which you have independent confirmation of doublets for which scikit image's function fails severely.
If I do come up with anything useful, I will make a PR, scout's honour.