openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

Partially or completely support the SELENON gene. #154

Open ifokkema opened 4 years ago

ifokkema commented 4 years ago

Adapted from NM_020451.2:

The SELENON gene encodes for a selenoprotein, containing the rare amino acid selenocysteine (Sec). Sec is encoded by the UGA codon. The 3' UTRs of selenoprotein mRNAs contain a conserved stem-loop structure, designated the Sec insertion sequence (SECIS) element, that is necessary for the recognition of UGA as a Sec codon, rather than as a stop signal. A second stop-codon redefinition element (SRE) adjacent to the UGA codon has been identified in this gene (PMID:15791204). SRE is a phylogenetically conserved stem-loop structure that stimulates readthrough at the UGA codon, and augments the Sec insertion efficiency by SECIS.

An example of where this currently fails is NC_000001.10:g.26140612_26140626delinsT. All endpoints strangely enough return an p.= prediction (not even a p.(=)). Obviously, it's not easy to recognize selenoproteins, but perhaps some steps can be made.

Peter-J-Freeman commented 4 years ago

To remind myself, I think that p.(=) shokiuld always be used instead of p.= . We have thought about this before, and I think it does make sense to implement this change. I will do this. now.

There is an alternate translation Alphabet used by Biopython that deals with this issue in theory. Shouldn't need a hack. I'm going to leave this for now until after the next stable release and start putting ideas together here for us to discuss.