opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
128 stars 28 forks source link

blocked reactions should report a warning #221

Closed cdanielmachado closed 6 years ago

cdanielmachado commented 7 years ago

Right now blocked reactions are reported as an error. Maybe this should instead be a warning? Or not a warning at all ?

There could be blocked reactions due to problems with gap filling, but there might also be cases where a reaction is blocked and still biologically relevant (for instance by resulting from the side activity of a promiscuous reaction).

These reactions should not be removed from the model, even if they cannot carry flux, as they still represent biological processes that can occur inside the cell if the right conditions are met (eg: being reconnected after insertion of a heterologous pathway during strain design).

ChristianLieven commented 7 years ago

I agree that the test for blocked reactions all is rather strict by accepting no blocked reactions at all. I wouldn't want to reduce it to a mere warning though, because of those cases where it might be problems with gap filling. However, arbitrarily relaxing the test requirements doesn't strike me as a good idea either.

There's got to be a more informed way to set a threshold. Perhaps some extrapolation depending on the genome annotations, or an average value of the portion of blocked reactions across models in ratio to the total amount of reactions?

cdanielmachado commented 7 years ago

What about not setting a threshold, and instead of returning a binary PASS/FAIL result, just report the percentage of blocked reactions as a statistic in the summary?

Like I said, blocked reactions are not necessarily "bad", they might represent a true biological state. One can remove blocked reactions from a model, and make the model "worse", in the sense that it looses biologically relevant information without gaining increased accuracy.

joanarcx commented 7 years ago

Daniel's point is highly relevant; I also agree with his proposal for a % result. Also, if the model includes regulation data of some condition some reactions might be blocked that wouldn't be in another condition. If you really want to issue a fail, maybe more than 25%? but still this is very arbitrary.

ChristianLieven commented 7 years ago

There are two sides for this, in a well-curated model blocked reactions give us information on where real-world knowledge is missing, however in a draft blocked reactions are mostly fixable, indicative of wrong constraints or existing gaps. Hence, I am not convinced to simply 'report' the blocked reactions, as they 'can' be wrong.

We're excluding regulation and different states from the test by determining the blocked reactions with all boundaries open.

This is why I prefer a threshold option, but of course an informed threshold would be best.

cdanielmachado commented 7 years ago

From my perspective, setting a threshold to transform a fraction or percentage into a binary value is just loss of data. Does not make the test richer or more informative, quite the contrary.

To be honest, for most of the tests, I would prefer to see some value/metric reported than a purely binary result (pass/fail). It would make the tests more informative.

Optionally, memote could then output a final global metric, defined as a weighted average of all results (weighted according to priority/severity level), rather than just saying "this model passed x of n tests".

ChristianLieven commented 7 years ago

First up, in principle I agree. Although I would want to keep the scoring simple and transparent.

From my perspective, setting a threshold to transform a fraction or percentage into a binary value is just loss of data. Does not make the test richer or more informative, quite the contrary.

Not sure if you've seen the html report yet, but it does contain all that information. I just want to stress that we don't lose any of the data, it is rather just spread out at the moment. We are planning to have it all unified in the generated reports (and by extension the python API output):

The binary output that you see in the console, is mostly the raw pytest output. This is the package we use to automize the whole testing suite with continuous integration. I'm not sure yet if the output there will ever be as rich as the html reports.

Suggestions on how the results are calculated should go here: https://github.com/opencobra/memote/issues/232

Midnighter commented 7 years ago

There are ways in principle to change how pytest reports the outcome of the test suite. I would prioritize improving the HTML report, though. The command line output is a crutch at the moment.

ChristianLieven commented 6 years ago

In the new report this test is not part of the scored tests and hence just reported as a statistic.

ChristianLieven commented 6 years ago

I liked the discussion on blocked reactions. The first yeast model had many of these “islands”, which disturbed a bit the numbers of reactions used on a certain carbon source etc. With our model, we decided to have no blocked reactions, by mainly deleting information and by closing some gaps (it was interesting to learn that more than 30 reactions were annotated by the yeast community in less than 2 years). Now, with enzyme promiscuity and moonlight functions, I would argue again that it matters why we setup a model. If it is meant to cover the entire biochemical space and is a very well annotated database for the biochemical potential, than the blocked reactions have to be included, if one want to compute a better network (metabolic engineering) or analyse the structure, the blocked reactions do not add much, but might disturb the numbers.

  • Lars Blank, comment on the manuscript