openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

Add check .start_codon_complete #230

Closed scottdbrown closed 4 years ago

scottdbrown commented 4 years ago

Using release 69 of Ensembl, I ran into an issue where some transcripts existed which had an annotated start_codon that only spanned two positions (example ENST00000543092).

Because of this, when .complete() was run, self.coding_sequence would hit an error when it tried to determine the _codon_positions: ValueError: Expected 3 positions for start_codon of ENST00000543092 but got 2

It seems that more recent releases of Ensembl do not have this issue (they no longer list a start_codon for these cases).

To continue using Ensembl 69, I added the additional property .start_codon_complete which catches a _codon_position ValueError and returns False, and added self.start_codon_complete as an additional check in .complete().

Fixes #231

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-0.1%) to 79.326% when pulling 9725a6949b14cd98d6a14b4f1f96df64065c87bc on scottdbrown:patch-1 into 6885722094a5c2eb79559cc90b2992f99f393d62 on openvax:master.

iskandr commented 4 years ago

This looks great, OK to merge. Thanks!