jotech / gapseq

Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks
GNU General Public License v3.0
159 stars 32 forks source link

gapseq's sbml in memote #42

Closed Waschina closed 4 years ago

Waschina commented 4 years ago

Hi, jotech, thanks for providing that media file, seems that composition of my media was wrong, now the growth rate is 0.056.

However now the xml file generated is showing valid by sbml validator but it shows following warning:

Warning: As a principle of best modeling practice, the should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models. The with the id 'M_cpd00001_c0' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an or .

Also when I try to run in memote it shows following error which was not being shown previously

critical: The model could not be loaded due to the following SBML errors. error: Something went wrong reading the SBML model. Most likely the SBML model is not valid. Please check that your model is valid using the cobra.io.sbml.validate_sbml_model function or via the online validator at http://sbml.org/validator . error: (model, errors) = validate_sbml_model(filename) error: If the model is valid and cannot be read please open an issue at https://github.com/opencobra/cobrapy/issues . error: Line 2, Column 0 - #1013: Invalid or undefined XML namespace prefix. error: - Category: XML content, Severity: 2

the error occur even when I do a fresh install of memote in new virtual environment.

I also check with cobra validation command cobra.io.sbml.validate_sbml_model('TelongatusBP-1.xml') and it give following error

(None, {'SBML_FATAL': [], 'SBML_ERROR': ['E0 (Error): XML content (core, L2); Bad XML prefix; Invalid or undefined XML namespace prefix.\n'], 'SBML_SCHEMA_ERROR': [], 'SBML_WARNING': [], 'COBRA_FATAL': [], 'COBRA_ERROR': ['No SBML model detected in file.'], 'COBRA_WARNING': [], 'COBRA_CHECK': []})

Originally posted by @hites77 in https://github.com/jotech/gapseq/issues/41#issuecomment-699020551

Waschina commented 4 years ago

Dear @hites77,

I will look into this issue. Would it be possible for you, to share your model ('TelongatusBP-1.xml')? If you wish, you can also send it to s.waschina[at]nutrinf.uni-kiel.de

Best Silvio

hites77 commented 4 years ago

I have mailed you the model file

Waschina commented 4 years ago

The main problem why memote failed is due to a mistake in the sbml header:

In line 2: Here, the argument xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1" is missing. The complete line should look like this:

<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1" xmlns:html="http://www.w3.org/1999/xhtml" level="3" version="1" fbc:required="false" groups:required="false">

Addtionally, line 3 misses the id argument. It should look something like this:

  <model id="TelongatusBP_1_1" fbc:strict="true">

In our tests of gapseq, the two aboved mentioned arguements are added automatically in the reconstruction process. Hence, it might be an issue of different versions of libsbml and/or the R-Package "sybilSBML". Could you maybe check the version numbers? In my tests I used libsbml version 5.18.0 and sybilSBML version 3.1.2.

hites77 commented 4 years ago

Thanks for the response. I am using the libsbml version 5.17.0 and sybilSBML version 3.0.1. Also if you have proper way to install it in server please let me know, the only way I installed sybilSBML was in my personal system.

So for other model I have to edit the lines as you have mentioned and it will run in memote and cobratoolbox ?

I checked the model you have send me in memote using command : memote report snapshot --skip test_find_metabolites_not_produced_with_open_bounds --skip test_find_metabolites_not_consumed_with_open_bounds TelongatusBP-1.xml

The score comes out to be 77% however there is error in Biomass consistency (under Biomass heading). Can this error will produce unreliable result during growth simulation study in bacArena ?

hites77 commented 4 years ago

HI, While checking memote detail report I found following issues that can be checked: In mass balance section it report: A total of 20 (1.34%) reactions are mass unbalanced with at least one of the metabolites not having a formula or the overall mass not equal to 0: rxn31154_c0, rxn25839_c0, rxn27289_c0, rxn25838_c0, rxn07291_c0, ...

In Charge Balance it report issue: A total of 6 (0.40%) reactions are charge unbalanced with at least one of the metabolites not having a charge or the overall charge not equal to 0: rxn13667_c0, rxn11114_c0, rxn14024_c0, rxn13671_c0, rxn13669_c0, ...

In Unbounded Flux In Default Medium it reports: A fraction of 26.89% of the non-blocked reactions (in total 171 reactions) can carry unbounded flux in the default model condition. Unbounded reactions may be involved in thermodynamically infeasible cycles: rxn00069_c0, rxn00070_c0, rxn00083_c0, rxn00086_c0, rxn00117_c0, ...

this are very important parameter , so please have a look and suggest

Waschina commented 4 years ago

Thanks for the response. I am using the libsbml version 5.17.0 and sybilSBML version 3.0.1. Also if you have proper way to install it in server please let me know, the only way I installed sybilSBML was in my personal system.

So for other model I have to edit the lines as you have mentioned and it will run in memote and cobratoolbox ?

Manually editing the xml files would work, but I recommend updating the sybilSBML version to 3.1.2 instead and manual xml-editing will not be necessary. Install instructions for the R-Package can be found on sybilSBML's CRAN page.

Concerning the memote output you mentioned, I am currently working on the corrections.

Thank you for your help!

Best wishes Silvio

hites77 commented 4 years ago

Sure I will try to update and check the output. Thanks to you also for on-time response and help. Just let me know when the model score will increase in memote, I have compared another model generated from kbase its score was around 90% in memote. but there also its score was low in certain important parameter.

Waschina commented 4 years ago

The commit 297a633412aa9e0af85702520101a375a17f5c17 repaired a number of inconsistencies that memote found (see above). We will correct more as we identify them.

I will close this issue here for now, as gapseq's sbml models should be compatible with memote now (at least with the latest version of sybilSBML). But please reopen in case errors of gapseq models in memote pop up.

@hites77 Thank you for your help and feedback.

hites77 commented 4 years ago

Hi, So I try to build a new GMM for one of the genome with latest update from gapseq and also latest sybilSBML and found that now Charge and Mass balance score to 100% in memote (Good updates in gapseq), but the Unbounded Flux In Default Medium score only 63.7 %. also Memote report that there were 28 Duplicate Metabolites in Identical Compartments. Further the memote still shows error in biomass consistency. So please have a look in this direction. Or let me know if there is a way to correct this error either in sybil R or in COBRA toolbox.

Waschina commented 4 years ago

I am glad the charge and mass balance scores of reactions are better now. And we are aware of the memote's notes on the biomass and unbounded fluxes. In fact, we are working on this already and will mention this issue in the commits that will fix those problems as soon as we implemented a tidy solution. Thanks again for the feedback!

Waschina commented 3 years ago

Hi, we have updated gapseq's handling of the biomass reactions. It is now ensured that the mass balance of the biomass reaction corresponds to 1g cell dry weight production. Please note, that memote's biomass consistency check will still show an error. This is because memote requires that all metabolites in the biomass production have a completely defined chemical formula (i.e. without "R"). However, gapseq's approach to the biomass reaction (and the same as in ModelSEED) is that the biomass reaction involves for instance the metabolite

ACP (Acyl-Carrier-Protein, id: cpd11493, formula: C11H21N2O7PRS)

Per unit of ACP-consumed by the biomass reaction, produces the same molar amount of

apo-ACP (inactive form of ACP, id: cpd12370, formula: HOR)

This handling of ACP enables that the biochemical modification of the ACP-protein that are require to activate ACP are also part of the biomass reaction and the FBA solution includes also the important metabolic reactions for ACP activation.

I hope this illustrates the underlying causes of memote's error output and the consistency of ModelSEED's and gapseq's biomass reactions.

Concerning the other issue you mentioned: we are on it :)

Thanks again for your help!

hites77 commented 3 years ago

Hi, Congratulations for your recent gapseq publications

What I want to mention is that for my metabolic model after gapfilling and running memote I am getting 65% unbounded flux , and hundreds of dead end metabolites. So, is there any way to solve this issue ?

Waschina commented 3 years ago

Hi! Thanks for coming back to this. In the last couple of weeks I payed special attention to both, dead end metabolites and unbounded fluxes. In brief, I think that the metrics calculated for gapseq models may in some cases point to minor inconsistencies in the database but not fundamental issues when it comes to the application of gapseq models for flux balance analysis. Here's why:

Unbound fluxes

First and in order to prevent a misunderstanding: The percentage that memote states next to "Unbounded fluxes in default medium" is the percentage of bound fluxes, not unbound. With gapseq models I tested I get a score between 80-90% bound fluxes (or 10-20% undbounded). image Second, memote states that unbound fluxes are not per se an issue but may point towards themodynamically infeasible reactions cycles. In gapseq, we curated the complete reaction database to ensure that the complete network is energy-generating futile cycle free. This is described in the gapseq manuscript in section "Biochemistry database curation and construction of universal metabolic model". Third, I noticed that a certain number of unbound reactions in gapseq is due cycles of isomerases. A <--> B <--> C <--> A. An unbound flux through such cycles is of course physiologically not realistic, but, on the other hand, do not affect e.g. the solution of FBA with maximisation of biomass formation. Some other unbound fluxes are due to duplicate reactions/paths with slightly different metabolite forms/names (e.g. L-/D-isomers); thus a unbound flux may go one route and returning via the other route. I am in the process to investigate these cases in more details, but also here, these cycles should not affect FBA predictions for model growth. In general, we would recommend using a pFBA or mtf-FBA for flux predictions. These approaches reduce/prevent fluxes through reaction cycles that do not contribute to the objective function.

Dead-end metabolites

Memote correctly states that dead-end metabolites could indicate gaps in the model's network but also in biochemical knowledge. We support the view of genome-scale metabolic network as structured knowledge-bases; which is why we keep dead-end reactions and metabolites in the model, as they may help users to manually add putative pathways and in general, to have an estimation of the biochemical knowledge that we are still missing for the organism. In some cases, dead-ends of course also represent gaps in our reaction database; but we are constantly working on improving the reaction database also with respect to this.

Hope this helps