jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

bioannotation.py - check for properly-formed EC numbers #27

Closed jorvis closed 10 years ago

jorvis commented 10 years ago

There are sources in public HMM and BLAST libraries which assert EC numbers that are malformed, such as "1.2.1.n2". Due to the nature of how these are used, I think the proper thing to do is to warn when the user attempts to add a malformed EC number but don't throw an exception.

mchibucos commented 10 years ago

My suggestion is to do 1.2.1 or give the option to do that, rather than automatically discard. It is less granular, but still valid e.c.

jorvis commented 10 years ago

I checked into the code here, and the constructor method to create a bioannotation.ECAnnotation object already did have a check for the format of the passed EC number:

re_pattern = re.compile('(((([0-9\-]+)\.[0-9\-]+)\.[0-9\-]+)\.[a-z0-9\-]+)')

In that regex I noticed that I was specifically already allowing alpha-characters in the last position, but didn't remember why. So I looked into the release notes at ExPASy and found this:

ENZYME now includes entries with preliminary EC numbers. Preliminary EC numbers include an 'n' as part of the fourth (serial) digit (e.g. EC 3.5.1.n3).

Therefore, the current implementation is correct, and no changes need to be made.