Closed Midnighter closed 7 years ago
What do you mean here? Which ID mapper did you have in mind and how is the BiGG ID check supposed to be different from the one already implemented?
We're working on an ID mapper tool for DD-DeCaF that might be leveraged. I think it's a good idea to provide additional IDs in models for easy cross-referencing. Although this might be a stand-alone tool and outside of the scope of memote. We should ensure, however, that the IDs are inside some known namespace maybe with a preference BIGG > ChEBI > MNX > KEGG or something like that.
I totally agree, a simple check could be the presence and correctness of a MIRIAM-style annotation for each reaction and compound. Checking for the presence is simple, checking for the correctness could be done by identifying the model's main namespace of the current ID, and then comparing it to the annotation IDs (from other namespaces) mapped back into the current main namespace.
I think this mostly has already been covered in #103, which contained tests that work out which identifiers belong to which database namespace and whether or not they can be matched by MIRIAM regex patterns. One could envision more elaborate additional tests that go beyond understanding the mere formatting or syntax of identifiers, however, I think that would stretch the time of testing quite a lot, as it would require either parsing the corresponding databases over the web or keeping a local, continuously-updated copy of the list of identifiers from the databases of interest.
So for example an additional test could be: "Does a given metabolite or reaction identifier exist in BiGG or SEED or MetaNetX?" vs. what we already do: "Does a given metabolite or reaction id look like those that could be in BiGG or SEED or MetaNetX, and does the associated reaction/ metabolite conform to the syntactic rules of the namespace?"
Personally, I'd like to focus on the more latter, as that question is more time-effective than having to query any of the databases for each metabolite and reaction.
I agree, I wouldn't want to spend too much time on this either. In my opinion, for someone reconstructing a model this is a one-time decision. Either you follow a specific namespace or you don't. Then you either fix faulty IDs or you don't.
I'd rather focus on tests that will be continuously useful during the reconstruction process.
For fixing any issues with annotations and identifiers, we should point people to Filipe Liu's Unification of Genome Scale Models once it is published.
Use the ID mapper to test metabolite IDs. Possibly suggest IDs (BIGG) and additional cross references.