uubram / RTCR

A pipeline for complete and accurate recovery of TCR repertoires from high throughput sequencing data.
GNU General Public License v3.0
21 stars 8 forks source link

TRBD d_call not matched and CDR3AA differents from mixcr #25

Open pengbo233 opened 11 months ago

pengbo233 commented 11 months ago

Hello, thank you very much for providing the RTCR code. It helps with replacing mixcr for analysis. However, I have noticed two issues:

I noticed that the nucleotide sequences for CDR3 in RTCR and mixcr are identical, but the corresponding amino acid sequences are different.

Use RTCR to analyze the CDR3 sequences. TGTGCCAGCAGTTTATGTCGACTAGCGGCACCTACGAGCAGTACTTC CASSLCRLAAPTSST Use mixcr to analyze the CDR3 sequences. TGTGCCAGCAGTTTATGTCGACTAGCGGCACCTACGAGCAGTACTTC CASSLCRL_GTYEQYF Compare the nucleotide and amino acid sequences obtained from RTCR and mixcr.

Any other relevant information about the data and analysis methods used. Issue 2: TRBD1 and TRBD2 in HomoSapiens are present in the immune_receptor_reference.tsv.gz database, but not detected during the alignment.

thanks

uubram commented 10 months ago

Dear pengbo233,

RTCR directly converts the CDR3 nucleotide sequence to amino acid sequence, codon by codon. In this process it does not change frame when reaching the J-segment. In your example, apparently the J-segment is out-of-frame. RTCR reflects this by setting the "vj_in_frame" field to false ("F").

As to RTCR not detecting TRBD, the software does not attempt to do this for two reasons. First, because TRBD sequences tend to be very short, alignments tend to be inaccurate. Second, TRBD identification is not required for the accurate retrieval of TCR sequences from high-throughput sequencing data.

Please let me know if the above solves the issues you raised.

Best wishes, Bram

pengbo233 commented 10 months ago

Use RTCR to analyze the CDR3 sequences. TGTGCCAGCAGTTTATGTCGACTAGCGGCACCTACGAGCAGTACTTC CASSLCRLAAPTSST Use mixcr to analyze the CDR3 sequences. TGTGCCAGCAGTTTATGTCGACTAGCGGCACCTACGAGCAGTACTTC CASSLCRL_GTYEQYF Compare the nucleotide and amino acid sequences obtained from RTCR and mixcr Thank you very much. Do you know how this kind of adjusts to get different amino acid sequences? Because this kind of aa has the tail end of F and W, which meets the standard of a complete CDR3, so I think it is interesting How is frame judged? Does CDR3 have a fixed result thanks a lot

uubram commented 10 months ago

The amino acid sequence resulting from translation depends on direction of translation and where the translation starts in the nucleotide sequence, called reading frame. You can put your nucleotide sequence example into Expasy to see how different reading frames lead to different amino acid sequences.

Looking only at the CDR3, in your example the amino acid sequence you'd get biologically is what RTCR is showing. As you mentioned, it is a 'non-standard' CDR3 and (likely) would not lead to the production of a functional TCR. You'd need to switch reading frame during translation (unlikely to happen biologically) to get what you called a 'standard' CDR3 as shown by mixcr (the change of reading frame was indicated by mixcr with the "_" character).

I hope the above answered your question.

Best wishes, Bram