liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
256 stars 47 forks source link

full-length amino acid sequence #243

Open ljy-sys opened 5 months ago

ljy-sys commented 5 months ago

CDR3序列.xlsx Thanks to the developers of the tool for facilitating us to analyze the immune repertoire data. I used the trust-smartseq.pl function to analyze the VDJ information of smart-seq data. This is a CDR3 sequence that I am interested in, but I am not sure how to obtain the full-length amino acid sequences of TCRα and β?

mourisl commented 5 months ago

Do you mean you also need the amino acid sequence including V and J genes? In this case, you shall use the AIRR output format, and translate the amino acid sequence from the sequence_alignment column if the sequence is full-length.

ljy-sys commented 5 months ago

Do you mean you also need the amino acid sequence including V and J genes? In this case, you shall use the AIRR output format, and translate the amino acid sequence from the sequence_alignment column if the sequence is full-length.

Oh, yes, that's right. I'll have a try. Thank you very much.

ljy-sys commented 5 months ago

Do you mean you also need the amino acid sequence including V and J genes? In this case, you shall use the AIRR output format, and translate the amino acid sequence from the sequence_alignment column if the sequence is full-length.

For smart-seq3 data, how to obtain all TCRa and TCRb chain amino acid sequences instead of only CDR3 sequences? Well, I have obtained the full-length TCR amino acid sequence by database alignment based on the obtained CDR3 sequence, but I want to see if it is consistent with our own alignment book. So I wanted to directly obtain TCR amino acid sequences for each cell.

mourisl commented 5 months ago

The smartseq.pl wrapper shall also output the _airr.tsv file. The sequence_alignment column includes how the underlying full sequence aligned against the IMGT database (germline_alignment column). But you may need to write your own script to translate the sequence_alignment column into amino acids.

ljy-sys commented 5 months ago

The smartseq.pl wrapper shall also output the _airr.tsv file. The sequence_alignment column includes how the underlying full sequence aligned against the IMGT database (germline_alignment column). But you may need to write your own script to translate the sequence_alignment column into amino acids.

Good, I have got the full-length amino acid sequence. However, I am not sure whether the "sequence" column or the "sequence_alignment" column should be used to translate the amino acid sequence in the aiir file. Hope to get your help.

mourisl commented 5 months ago

"sequence" column is the underlying assembled contigs, which includes regions outside of the V genes. The "sequence_alignement" is the portion that can be aligned to the IMGT database, so it is more appropriate to use this one.

ljy-sys commented 5 months ago

"sequence" column is the underlying assembled contigs, which includes regions outside of the V genes. The "sequence_alignement" is the portion that can be aligned to the IMGT database, so it is more appropriate to use this one.

Ok, thank you very much for your reply. It is very helpful to me, hey, good luck.