Missing NGAM wrongly reported

cdanielmachado commented 6 years ago

memote is reporting missing NGAM when the lower bound of R_ATPM is zero.

I think it should check for the presence of NGAM but not force a particular value.

The principle of constraint-based modeling is that one iteratively refines a model by applying constraints and reducing the solution space as more data is acquired. The model is valid as long as the "true" phenotype is still somewhere inside the solution space.

If NGAM is unknown because it was not yet calibrated from experimental data, a range of [0, max_flux] is still a valid constraint.

ChristianLieven commented 6 years ago

memote expects NGAM to be an ATP hydrolysis reaction with a non-zero lower bound. This test is based on the definition in [1]. I'm not sure how "true" a cellular phenotype can be represented with a model that doesn't account for this basal level of maintenance.

[1] Thiele, I., & Palsson, B. Ø. (2010, January). A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. Nature Publishing Group. http://doi.org/10.1038/nprot.2009.203

cdanielmachado commented 6 years ago

Some reasons why I still disagree:

1) If the lower-bound is zero, the "true" phenotype is still inside the solution space. So the model is not wrong (this is the fundamental principle of constraint-based modeling).

2) Even if all organisms have some level of non-growth associated ATP costs, a model does not need to account for everything to be correct. There are many other things which are not being accounted for (e.g: regulation). All models are just incomplete by nature.

3) What would then be a suitable minimum value to consider? Non-zero is a very vague definition. In linear programming one cannot even distinguish strict inequalities, so "x > 0" and "x >= 0" are essentially the same.

ChristianLieven commented 6 years ago

To 1. I guess it depends on the test, but the result of a single test in memote cannot dictate whether a model is wrong or right, so failing a single test doesn't necessarily mean the whole model is unusable. These tests rather provide a guide to incrementally improve COBRA models.

To 2. I understand that regulation may be outside of the scope of a metabolic model, however I think that all aspects of metabolism that can be represented should be represented explicitly (especially if it affects a model's results). I am yet to find a report for an organism without non-growth associated ATP costs, and it is fairly simple to determine experimentally. Thus, requiring it would certainly make sure that NGAM is considered by the user, so that in the next iteration the corresponding constraints will be applied.

To 3. If I understand the concept of NGAM correctly, it may depend both on the environmental and intracellular conditions. Since we're talking about finite amounts of molecules being turned over, I don't think they'd ever get so small that the solvers we use couldn't differentiate "x > 0" from "x >= 0" anymore.

I could agree to split the test in two, one checking for an ATP hydrolysis reaction with the id (and/or SBO term) for NGAM, and another checking if the lower bound of NGAM is constrained.

cdanielmachado commented 6 years ago

The question is: if you don't measure it, what value should you put there?

I think the result would be people putting there some arbitrary value (like lb=1) just to avoid getting errors in memote.

Let's at least keep the lack of NGAM reaction and the zero lower bound reported separately and with different severity levels ?

cdanielmachado commented 6 years ago

PS: Actually, if memote becomes widely adopted, I think this will probably happen very often.

Whenever you define some kind of metrics, people just fine-tune their systems to fit the metrics, even in cases where it is obviously not the right thing to do.

Just something to consider when designing some of the tests. Don't make them in a way that they could easily be circumvented without making the model better.

ChristianLieven commented 6 years ago

The question is: if you don't measure it, what value should you put there?

I'd suggest using a value from a closely related organism, the best estimate, or a value that has been achieved through data-fitting until measured properly.

I think the result would be people putting there some arbitrary value (like lb=1) just to avoid getting errors in memote.

and

Whenever you define some kind of metrics, people just fine-tune their systems to fit the metrics, even in cases where it is obviously not the right thing to do.

While this may be the sad reality, it may also trigger people to contemplate how to obtain the true parameter, which is what I'd like to gear the system towards. In any case, that is something we can't know yet, but the chances of getting caught trying to cheat the system are higher when the system covers this case. So, in addition to asserting that the value is non-zero, we should definitely display it in the report.

cdanielmachado commented 6 years ago

To be honest, I am still not totally convinced. A few arguments why:

Defining the bounds for ATPM as something like [0,1000] is not the same as not accounting for maintenance ATP, it is saying that such a value exists and should be somewhere within that range.
If you run a simulation where both growth and uptake rates are constrained, it is perfectly possible that you get a flux through ATPM, without having to force one.

Actually, it would even make a perfectly valid objective function.

If the argument is that all cells have some NGAM costs, and so the lower bound of ATPM should be non zero, then I would argue that all cells grow, therefore the lower bound of the biomass reaction should also be non-zero.

opencobra / memote

Missing NGAM wrongly reported #222