liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
268 stars 46 forks source link

10X barcoding of long reads #279

Closed ktpolanski closed 2 months ago

ktpolanski commented 2 months ago

Hello,

It would appear that there are some protocols out there that combine 10X barcoding with long reads. Their analysis workflow has them pre-filter the reads to get rid of failed UMIs, then process everything via MiXCR, and apply the UMI information in post-processing by ditching UMIs where reads were assigned to different clonotypes.

Given TRUST4's innate support of CB+UMI information, it would form the basis of a more elegant processing of the data. I fully expect that it should be quite straightforward - prior issues mention TRUST4 working well with PacBio data, and I can prepare split reads with CB+UMI information pulled out and stored separately.

Just a couple quick questions:

Thanks a lot and sorry for the trouble!

mourisl commented 2 months ago

TRUST4 has not been optimized for the long read yet. It just supports long read length, but the results may be suboptimal. For example, if there are too many indel sequencing errors, TRUST4 may not handle those well.

Since the long-read may contain many sequences before V-genes and after C gene (like the poly-A tails you found), maybe you can add the option "--repseq", which will aggressively trim the sequences out of the VDJ region in a read.

Hope this helps.

ktpolanski commented 2 months ago

That is actually super convenient, just in case some unforeseen garbage sneaks into the reads outside of the adapters I expect and will be actively looking for. Thanks for the heads up!