opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
125 stars 26 forks source link

[discussion] reconstructions vs. models #228

Open ChristianLieven opened 6 years ago

ChristianLieven commented 6 years ago

A reconstruction is a the broad database of reactions, and becomes a model when it is initialised with constraints.

The difference between a reconstruction and condition-specific models derived from a reconstruction need to be defined.

  • Ines Thiele, Comment on the Manuscript

Here it is crucial to note that mostly reconstructions and not condition-specific models are published. In an average paper, I have 1 reconstruction but tens to hundreds of different models, which were derived from the reconstruction. It would make no sense to publish hundreds of models, but rather provide scripts (or tables) that permit others to re-create the model-derivation from the reconstruction, which we always do.

  • Ines Thiele, Comment on the Manuscript

For memote I see several options here: We try to detect model vs reconstruction and toggle our tests accordingly, we specialise on either one and offer comprehensive tests, we offer tests for both simultaneously and let the user interpret the results but continue to offer advice through tooltips/ documentation.

The problem is, when testing we cannot know whether a model is poorly initialised (and thus broken) or whether it is just an uninitialised reconstruction (which can be perfectly functional once proper constraints are added with an external script). Since it were more explicit and made our lives easier, I'd personally prefer, if reconstructions were ALWAYS initialised to a ’typical’ reference state, like ecoli-models should be initialised to growth on glucose and minimal medium, methanotrophs should be initialised to minimal medium and methane, cyanobacteria to photosynthesis, human cells to glucose consumption and respiration etc. From there on, scripts should point to the other possible models that can be derived from a reconstruction.

What does everyone think about this?

cdanielmachado commented 6 years ago

I think that what Ines refers to as a reconstruction but not a model, as something that is not initialized with constraints (i.e. environmental conditions or context-specific constraints), is related to my comment that a model without enviromental constraints (I still call it a model) is still something that is valid to be published.

Nonetheless, I think these un-initialised models (or reconstructions) should still be subject to validation and should pass most (if not all) of memote tests, after being initialized with the most possibly relaxed set of constraints (i.e. simulation of growth on complete media).

@ChristianLieven I disagree with the following, sorry :)

" I'd personally prefer, if reconstructions were ALWAYS initialised to a ’typical’ reference state, like ecoli-models should be initialised to growth on glucose and minimal medium"

I think it is very hard to say what a typical reference state should be. For instance, why should ecoli reference state be growth on glucose minimal medium? Ecoli is a gut bacteria, shouldn't the reference state be the nutritional composition of the gut ?

ChristianLieven commented 6 years ago

In the manuscript there were other concerns about the 'default' state of a model:

What if the reconstruction refers to a species which cannot be isolated or cultivated? A model might not be initialized with environmental conditions if they are unknown. Yet such model might still be perfectly valid and useful (eg: for EFM analysis, minimal medium calculations, omics data analysis, etc).

  • Daniel Machado, Comment on the Manuscript

If the model is provided with no defined default (i.e. reference) state (because it isn't known), that could somehow be communicated/ detectable to memote so that we can skip tests intelligently. On the other hand, the goal of memote is to help models converge towards the optimal representation of their corresponding biology. Skipping tests at will or because at the present state there is insufficient information on the organism in question seems to reduce the 'effectiveness' of memote as a driver of this change. I mean, once we define and assign weights to each test and calculate a final score, models that fail some tests may still be considered high-quality if they pass others.

I agree, but my only concern here is that the exchange reactions define the external environment and not the organism (for the particular case of an FBA simulation they simulate the flow of extracellular compounds in a chemostat). So, by forcing them to be defined in the model, we are essentially saying that an SBML model is by definition a model of the organism AND the environment. I guess that would be perfectly fine as it would cover most use cases, but it should be made very clear that this is the definition that the field has adopted (1 SBML model = 1 organism + 1 environment).

  • Daniel Machado, Follow-up comment on the Manuscript

    Here are two more comments on what constitutes a default state:

What are the defined "different conditions"? Meaningful conditions may differ substantially across models... e.g., light and CO2 for photosynthetic models, metals/metal ions for some models, potentially combinations of substrates for some biomass precursors, etc.

  • Nathan Lewis, Comment on the Manuscript

Here, I just summarized the two tests that are currently implemented in memote: 1. Default state of the models 2. All exchanges open. With the first condition we assume that models are published in a ready-to-use state. I agree that these states may differ significantly between models, but I thought it fair to assume that biomass can be generated nonetheless. Using the default state of the model avoids making assumptions about the biology that we as testers do not have. In my opinion this 'burden' of providing a functional default state lies with the model builders. We test for the second condition primarily to help debugging biosynthesis pathways during the reconstruction phase.

  • My response to that on the manuscript
arichelle commented 6 years ago

This discussion really highlight the ambiguity of terminology used in the field. Personally, I agree with Christian for reconstruction=database of known existing reactions within an organism/cell type and model=reconstruction that is constrained to represent the case study it is used for. In this context, I also agree with Daniel for the fact that the reconstruction should not be pre-constrained but includes the necessary information to constrain it. For example, when a genome-scale reconstruction is published, this library is most often validated using different experimental data. To this end, I think the idea of providing the necessary scripts to reproduce these condition with an automatic constraint generation would be really valuable.

Moreover, I'd like to underline that terminology like condition - or context- specific model are also broadly used to define models that have been "extracted" from a genome-scale reconstruction using experimental data but only include the part of it that is "active" under the defined condition/ context, doing so are no more a genome-scale model (i.e. a genome scale reconstruction constrained to observed environmental conditions). This aspect is not mentioned in the manuscript for the moment but, in these cases, memote could be a really useful tool to check the validity of these kind of extracted models. Indeed, when you use metabolomic and/or transcriptomic data to extract a context-specific model nothing ensure that this latter is still biologically meaningful. Using the comparative assessment of models presented in memote could allow users to ensure that the extracted model behave globally as the original one (no violation of universal constraints, etc).

bgoli commented 6 years ago

I would agree that any genome-scale metabolic model that is claimed to grow or reproduce an experimental result should, where possible, be provided with a default parameter set that gives a reproducible output.

This also has the positive spin-off that model databases like Biomodels or BiGG can more easily curate a published model and that non technical users/modellers can be reassured that their software is, to a point, interpreting the model correctly.

But, as Daniel mentions, whether this is part of "model validation" or "best practice" depends on accepted definitions of a "model:. I think memote can distinguish between the two without loosing any of its evaluation credentials.

intawat commented 6 years ago

Reconstruction = Building a network, for me it can be either database or model. Database = a collection of reactions, gene, stoichiometry, metabolite and etc Model = a network that contains rich information of metabolism. We can use just the network to integrate omics data and gain insight in to cellular processes WITHOUT simulation.

The key purpose of GEM is quantitative simulation.

For me MEMOTE will help us to improve the reconstruction (network) as well as simulation.

jonovik commented 6 years ago

I agree with Ines' distinction between "reconstruction" and "condition-specific model derived from a reconstruction". @cdanielmachado , I don't think Ines says that a reconstruction is not a model, but that a reconstruction is a more general model.

Let me define my usage of terms before continuing:

I think Memote should validate both recons and condition-specific models as far as applicable. So a recon can be tested for quality of annotation, dead ends, etc., whereas other tests may require the specification of a medium, condition-specific biomass compositions, expressed genes, etc. In any case, it is vital that condition-specific models can be traced back to their parent recon.

Furthermore, I agree that it is necessary to distinguish between the medium/environment and the system/cell within it (cf the mention of .yml config files in the section "Data-dependent tests"). Simulating different media serves as virtual experiments that characterize the model of the system.

This has implications for standards requirements for GEMs. I realize now that MIRIAM (Box 3, point 5) does not address flux balance models, only full-fledged ordinary differential equations (my emphasis):

The encoded model must be instantiated in a simulation. This means that quantitative attributes of the model have to be defined. Therefore, the model must contain, or be associated with, values (or ranges of values) for all initial conditions and parameters, as well as kinetic expressions for all reactions. These values can be provided as a separate file from the model itself.

This means that we cannot claim MIRIAM compliance as per the le Novère et al. 2005 paper, even though we can and should demand MIRIAM-compliant annotations (table 1 of the Memote draft).

While flux balance models draw much of their strength from not requiring kinetic expressions, the wish to specify an external medium is similar to MIRIAM's requirement for initial conditions and parameter estimates. The same goes for biomass composition, which is specific to the organism's "mode of operation", e.g. growth rate, and tissue-specific in multicellular organisms. (This is also why I think it's important to distinguish "objective function" from "biomass composition".)

jonovik commented 6 years ago

It seems I was mistaken when I wrote

I don't think Ines says that a reconstruction is not a model, but that a reconstruction is a more general model.

In a comment to the "Increasing numbers..." paragraph, she did indeed write "reconstructions (not models!)". In that case, I will escalate to polite disagreement: I think "model" should be a wider term than "reconstruction". For the narrower meaning I'd prefer "constraint-based model" or "condition-specific model", whichever fits best.