opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
125 stars 26 forks source link

Add tests for model encoding (and corresponding section in report) #661

Open matthiaskoenig opened 5 years ago

matthiaskoenig commented 5 years ago

The key idea of memote is that

Models should, for the benefit of the community and for research gain, live up to certain standards and minimal functionality.

Within the memote manuscript it is recommended that models should be exchanged as SBML3FBC. Until now no tests exist (and no section in the report) which tests if the model lives up to a minimal standard, i.e., if the model is encoded in a way that people can read it and run additional tests on it. One could use very easily the output from

model, errors = cobra.io.validate_sbml_file(...)

to add such tests which provides detailed information on errors and warnings in the SBML file as well as errors and warnings in parsing the model into an executable LP problem (COBRA_* categories).

        "SBML_FATAL",
        "SBML_ERROR",
        "SBML_SCHEMA_ERROR",
        "SBML_WARNING",

        "COBRA_FATAL",
        "COBRA_ERROR",
        "COBRA_WARNING",
        "COBRA_CHECK",

If there are no SBML_* and COBRA_* errors and warnings and the model is encoded in SBML3FBC it would get 100% in this new encoding category. Depending on the problems in the file points are deduced. This would provide a simple way to measure how well the model is encoded in SBML (and thereby how exchangable it is). This would provide a direct motivation/feedback for users to improve the model encoding and thereby the model quality.

Best Matthias

Midnighter commented 5 years ago

The current behavior is to display an entirely different report when there is at least one SBML error. Just capturing the warnings in validate_sbml_file is now a bit off. We thought that this makes more sense since non of the other tests can run without a valid model object.

matthiaskoenig commented 5 years ago

There are many SBML errors which do not affect model reading or evaluation via fbc. E.g. all Kbase models are invalid SBML (have SBML errors) because they have a bug in the SBML exporter which results in problems of some chemical formulas (see log attached), but are perfectly fine when running with memote. Only if the SBMLErrors result in COBRAErrors than there is an issue with the model See error log kbase-errors.txt

Midnighter commented 5 years ago

Sorry, I was unclear. We indeed only display the alternate report when cobrapy fails to read the model. You are correct, we should make warnings and errors available in the report and score it. The result seems rather binary to me, though, or do you have an idea how a continuous score could be applied?

matthiaskoenig commented 5 years ago

I could imagine a staged score with various categories: [100%, 75%, 50%, 25%, 0%] image

With the categories being defined as: