Open jeremymsimon opened 5 months ago
I think this issue can be better handled during downstream analysis. As you said, these barcodes can be easily identified, but the filtering cutoff (like entropy) might be tunable. It would be more efficient to find the appropriate cutoffs in the dataframe rather than re-running Chromap.
Hi @haowenz, I tend to not force use of the 10X barcode include-list since it is possible there could be valuable information/real cells, but I noticed that some scATACseq data processed via
chromap
has identified some "cells" that manage to pass all QC, and escape doublet discrimination (!), but whose barcode sequence is something likeGGGGGGGGGGGGTGGG
or a similar highly-G-rich/low-complexity sequence. This isn't a bug per se ofchromap
, and can obviously be fixed by forcing the identified cells to be contained within the include-list, however I do wonder whether barcodes like this could perhaps be flagged if they have exceedingly low entropy? There were about 200 such "cells" out of ~70,000 in the dataset I'm currently working with, so it's rare but it was frequent enough such that these cells formed their own cluster in my data. Curious to hear your thoughts! Thanks!