i2bc / ORFmine

ORFmine is an open-source tool for identifying and analyzing all Open Reading Frames (ORFs) in genomic data, focusing on their sequences, structures, evolution and translation activities.
https://i2bc.github.io/ORFmine/
MIT License
6 stars 1 forks source link

Add different codon tables usage #3

Open nchenche opened 2 years ago

nchenche commented 2 years ago

During orftrack annotation process, ORFs are first defined from stop at stop codons. The stop codons and all other codons are hardcoded and come from the standard genetic code. This is problematic if the species (or even a human mitochondria) uses a different genetic code.

The hardcoded genetic code used to define stop codons (among others) needs to be somehow adjusted so that users can choose the codon table that fit it needs. A parameter could be used to set the desired codon table.

(https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes)

Infos from biopython package

import Bio.Data.CodonTable as ct
for k,v in ct.generic_by_id.items():
     print(k, v)

# k is the key id referring to the genetic table (same id as the one in the ncbi link above)
# v is an instance of the NCBICodonTable class with the following attributes:
v.id  # -> 33
v.names  # -> ['Cephalodiscidae Mitochondrial', None]
v.forward_table  # -> {'TTT': 'F', 'UUU': 'F', ...}
v.back_table  # -> {'K': 'AGG', 'N': 'AAU', ...}
v.start_codons  # -> ['TTG', 'UUG', 'CTG', 'CUG', 'ATG', 'AUG', 'GTG', 'GUG']
v.stop_codons   # -> ['TAG', 'UAG']