opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
127 stars 27 forks source link

Verify annotation IDs! #315

Open ChristianLieven opened 6 years ago

ChristianLieven commented 6 years ago

Problem description

Namespaces/ Identifiers may change over time/ become deprecated! Warn the user for this possibility => False positives!

FYI, here is an example of a KEGG ID that was given in a previous model (iMK1208) but is now obsolete. By matching reaction IDs in MetaCyc we could still find the new KEGG compound ID. Of course Memote isn't expected to check these kind of things, but good to be aware of this when you write some documentation.

  • Eduard Kerkhoven

image

This would fall right through since both match the kegg.compund regex A test that parses identifiers.org could verify the existance of each identifer. Could have it skipped by default, and users who know that they can spare the time can activate it specifically

gregmedlock commented 5 years ago

Looks like the most efficient method to do this would be the /identifers/validate/{id} service described here: http://identifiers.org/restws

I'm not familiar enough with REST/associated internet protocols to know whether having users with models containing ~1,000s-10,000s of identifiers to validate might overwhelm their servers...

ChristianLieven commented 5 years ago

I'm not too familiar with this either. Would it be a good idea to reach out to the maintainers of identifiers.org about the implications of hitting them with quite a lot of requests?

draeger commented 5 years ago

An alternative solution could be to do only one online check, namely to compare the MIRIAM XML file's date at http://identifiers.org/download/ to one that MeMoTe could ship (or contain). If the online version is newer, download it. Afterwards, all identifier schemas can be validated against the local XML file's content. This may be faster and save a lot of resources or online requests.