mpdunne / orthofiller

OrthoFiller: Identifying missing annotations for evolutionarily conserved genes.
GNU General Public License v3.0
22 stars 1 forks source link

Sequences not multiple of 3: poor stderr information #6

Open MatteoSchiavinato opened 7 years ago

MatteoSchiavinato commented 7 years ago

With one of the file sets I'm using in the analysis, I get this error:

/software/python/Python2.7/lib/python2.7/site-packages/Bio/Seq.py:2071: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before transl ation. This may become an error in future.

I checked my files and many of the coding regions in the GTF file are not multiple by 3 looking at the exon feature, because it may contain UTR, while they are if looking at the cds feature. How does the program actually handle this? Does it read the exons, unite them and then estimate the coding region by searching for start and stop codons?

Also, I think it would be helpful for the end-user to see the gene name that generated the Biopython warning in the standard error, to do an immediate check and perhaps grep it out of the file.

xonq commented 4 years ago

Specifically, this is BioPython. Not sure if much can be done without altering the BioPython code