jon-xu / scSplit

Genotype-free demultiplexing of pooled single-cell RNA-Seq, using a hidden state model for identifying genetically distinct samples within a mixed population.
MIT License
39 stars 9 forks source link

Requirement for umi tools deduplication #17

Closed dn-ra closed 2 years ago

dn-ra commented 2 years ago

Hi Jon,

I have a query about step c) of the protocol (using umi_tools to collapse PCR duplicates) Is this necessary if I am using the bamfile that has come off cell ranger? I am facing some strange behavior with umi_tools so I'm wondering if this is necessary. I have a 10x 5' GEX library of some 500 million reads.

For background, the issues I'm having are that: 1) umi_tools extract is not removing the content of of R1 when it apends the UMI onto the read name of R2. 2) If I grep for a particular barcode+UMI combination in the amended R2 file, I find that the transcripts corresponding to it are not very similar and it makes me wonder if an error has happened somewhere.

I appreciate that these points are better to ask to the umi_tools authors, just thought I'd give you some context.

Thanks, Dan

jon-xu commented 2 years ago

Hi Dan,

It will have limited impact for downstream tasks. But if you want, you can use umi_tools dedup directly, rather than starting from the beginning of umi_tools pipeline.

Cheers, Jon

dn-ra commented 2 years ago

Thanks Jon, That's really handy advice. Many thanks.