opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
126 stars 26 forks source link

Test metabolite identifiers #62

Closed Midnighter closed 7 years ago

Midnighter commented 7 years ago

Use the ID mapper to test metabolite IDs. Possibly suggest IDs (BIGG) and additional cross references.

ChristianLieven commented 7 years ago

What do you mean here? Which ID mapper did you have in mind and how is the BiGG ID check supposed to be different from the one already implemented?

Midnighter commented 7 years ago

We're working on an ID mapper tool for DD-DeCaF that might be leveraged. I think it's a good idea to provide additional IDs in models for easy cross-referencing. Although this might be a stand-alone tool and outside of the scope of memote. We should ensure, however, that the IDs are inside some known namespace maybe with a preference BIGG > ChEBI > MNX > KEGG or something like that.

ChristianLieven commented 7 years ago

I totally agree, a simple check could be the presence and correctness of a MIRIAM-style annotation for each reaction and compound. Checking for the presence is simple, checking for the correctness could be done by identifying the model's main namespace of the current ID, and then comparing it to the annotation IDs (from other namespaces) mapped back into the current main namespace.

ChristianLieven commented 7 years ago

I think this mostly has already been covered in #103, which contained tests that work out which identifiers belong to which database namespace and whether or not they can be matched by MIRIAM regex patterns. One could envision more elaborate additional tests that go beyond understanding the mere formatting or syntax of identifiers, however, I think that would stretch the time of testing quite a lot, as it would require either parsing the corresponding databases over the web or keeping a local, continuously-updated copy of the list of identifiers from the databases of interest.

So for example an additional test could be: "Does a given metabolite or reaction identifier exist in BiGG or SEED or MetaNetX?" vs. what we already do: "Does a given metabolite or reaction id look like those that could be in BiGG or SEED or MetaNetX, and does the associated reaction/ metabolite conform to the syntactic rules of the namespace?"

Personally, I'd like to focus on the more latter, as that question is more time-effective than having to query any of the databases for each metabolite and reaction.

Midnighter commented 7 years ago

I agree, I wouldn't want to spend too much time on this either. In my opinion, for someone reconstructing a model this is a one-time decision. Either you follow a specific namespace or you don't. Then you either fix faulty IDs or you don't.

I'd rather focus on tests that will be continuously useful during the reconstruction process.

ChristianLieven commented 7 years ago

For fixing any issues with annotations and identifiers, we should point people to Filipe Liu's Unification of Genome Scale Models once it is published.