smithlabcode / ribotricer

A tool for accurately detecting actively translating ORFs from Ribo-seq data
http://doi.org/djv4
GNU General Public License v3.0
31 stars 8 forks source link

Internal ORFs #103

Open annebresciani opened 2 years ago

annebresciani commented 2 years ago

I was wondering why Ribotricer does not have an ORF category that is called "internal". I was trying to compare results for ribotricer and ribocode and I identify an internal ORF in RiboCode that does not seem to be in the ribotricer index at all. They are based on the same human reference GRCh38 from Ensembl version 104.

I looked into the ribotricer code, and I can see that there is an ORF type called "internal", but you do not append it (prepare_orfs.py line 258 and 340). Can you help me explain the reasoning for this. Perhaps I am just misunderstanding the code.

I have an example of an internal ORF in transcript ENST00000675536 that is part of Ribocodes indexing, but not Ribotricer (in relation to that transcript). The AA sequence that it translates to is found in other transcripts, so it is not that the ORF can actually not be identified as translating, but the annotation is missing. I don't know if it is a bug or intended, but I would really like to understand the reasoning. It seems that many ORFs are found in that transcript, so why not that one? In ribotricer it has the coordinates 89646069_89649410_396 (ENSG00000131165).

Thank you in advance! I look forward to your answer. Kind regards, Anne

saketkc commented 2 years ago

The reasoning to exclude them was because there were so many of them. We should make it available as a flag. For now you can just convert the ribocode index to a ribotricer compatible one. I'll keep this open till we fix it.

annebresciani commented 2 years ago

Thank you very much for your swift reply. I am happy to read that I did not misunderstand. For now, I think that we will be okay with how it is, but it would definitely be good to have it as an optional in the future. Another idea could be an option to collapse the identical ORFs into one row when you have several ORFs that are identical but are identified based on several different transcripts. I am right now doing this downstream, but might be useful for others as well.

Best, Anne

Ps. out of curiosity, are there any plans for developing the tool further?