kescull / immunopeptidogenomics

Tools for harnessing RNA-seq data to discover cryptic peptides in the immunopeptidome by mass spectrometry
MIT License
2 stars 1 forks source link

Related to triple_translate.c #1

Closed mguaita closed 1 month ago

mguaita commented 1 year ago

Dear Scull KE,

I would like to know whether the triple_translate.c programme does consider the transcripts strand for the translation and also, to confirm that the programme is designed to translate the DNA sequences of all transcripts obtained with GFFRead (hence the input for translation are DNA sequences, not the RNA sequences of the transcriptome).

Thank you very much for making the code available.

Maria G.

kescull commented 1 year ago

Dear Maria, Thanks for your interest, I hope triple_translate can help with what you want to do. The page about gffread says "gffread can also be used to generate a FASTA file with the DNA sequences for all transcripts in a GFF file", which is how I got the transcript sequences for input to triple_translate. So yes, although they are transcript sequences, they are written in DNA code. The strand is usually noted in the header, but the sequence is written out 5' to 3' regardless of whether it's using the forward or reverse strand of the reference genome. That is, the transcripts are written out as specified in the gtf file you gave to gffread and they go in the 'right direction' for translation. Because of that, I wrote triple_translate to translate DNA sequences in 3 frames (not 6) and only in the direction of the transcript sequences. It doesn't translate sequences that didn't have the strand specified in the header, because I wasn't sure of their direction, but I saw very few of those. The log file/standard output will tell you which transcripts were not translated for this reason. I hope that helps, Kate

mguaita commented 1 year ago

Hello Kate,

Thank you very much for your time and the fast reply.

I really appreciate the extended explanation, it does help indeed,

Maria