milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
327 stars 79 forks source link

Non-CDR3 sequences, Clontech TCR kit #533

Closed rspreafico closed 4 years ago

rspreafico commented 5 years ago

Hi there, I was evaluating whether MiXCR could be used to analyze Clontech RACE TCR data (suggested to run with MiSeq 2x300). MiXCR is reported by Clontech as the tool they used to analyze their pilot datasets. Upon enquiry, Clontech reports that they run MiXCR with default parameters, which means assembling clones using CDR3.

However, given that the full variable region is sequenced by the Clontech protocol, I was trying to understand whether MiXCR could be used to get more information than that. I thought that setting the --region-of-interest parameter of analyze amplicon to VDJTranscript or at least VDJRegion could do it, but either almost abolishes alignment rates. As a backup, maybe one could set CDR1+CDR2+CDR3, but this doesn't seem to go thru with the --region-of-interest parameter.

I would be very interested in MiXCR if it could be established that it can make use of the full sequence information to capture non-CDR3 mutations as well in Clontech TCR data. Have you tried MiXCR with that type of data? Thank you for your feedback.

dbolotin commented 5 years ago

Because actual boundaries of the target library sequencing reads are different form molecule to molecule, the contigAssembly should better work in this case.

Just add --contig-assembly to the mixcr analyze amplicon command.

You can also use --impute-germline-on-export to see both: partially and fully covered gene features. Without this option, only gene features that are fully covered by assembled contig will be outputted. E.g. if CDR1 happened to be on the end of sequencing read, and last two letters are just not covered, then without the --impute-germline-on-export option CDR1 will be completely skipped for the clonotype.