10X barcoding of long reads

ktpolanski commented 2 months ago

Hello,

It would appear that there are some protocols out there that combine 10X barcoding with long reads. Their analysis workflow has them pre-filter the reads to get rid of failed UMIs, then process everything via MiXCR, and apply the UMI information in post-processing by ditching UMIs where reads were assigned to different clonotypes.

Given TRUST4's innate support of CB+UMI information, it would form the basis of a more elegant processing of the data. I fully expect that it should be quite straightforward - prior issues mention TRUST4 working well with PacBio data, and I can prepare split reads with CB+UMI information pulled out and stored separately.

Just a couple quick questions:

Do I need to parameterise the call differently to what I'd do for normal 10X to account for the long read nature of the data?
Should I bother stripping out poly-Ts from the reads if present?

Thanks a lot and sorry for the trouble!

mourisl commented 2 months ago

TRUST4 has not been optimized for the long read yet. It just supports long read length, but the results may be suboptimal. For example, if there are too many indel sequencing errors, TRUST4 may not handle those well.

Since the long-read may contain many sequences before V-genes and after C gene (like the poly-A tails you found), maybe you can add the option "--repseq", which will aggressively trim the sequences out of the VDJ region in a read.

Hope this helps.

ktpolanski commented 2 months ago

That is actually super convenient, just in case some unforeseen garbage sneaks into the reads outside of the adapters I expect and will be actively looking for. Thanks for the heads up!

liulab-dfci / TRUST4

10X barcoding of long reads #279