Speed Issue for Large Models

mfcesur commented 1 year ago

Hi,

I am trying to curate stoichiometric inconsistency problem for a newly reconstructed genome-scale metabolic network. In this regard, I am checking the consistency via MEMOTE test suits after each curation step. Because of the high reaction number in our metabolic model (over 8000 reactions), I am using MEMOTE Python package.

I should accelerate the consistency test. Accordingly, I repeated the test using a high-performance computer. Nevertheless, the analysis took hours. Could you have any suggestions (except for skipping the remaining tests like annotation test) in order to accelerate the consistency analysis?

Thank you so much for your time and consideration.

Best Regards, Müberra

Midnighter commented 1 year ago

Hi,

The most important factor is probably the mathematical solver that you use. The stoichiometric inconsistency test is a mixed-integer problem. By default, memote uses GLPK as the solver which is really slow at solving such problems. If you are an academic, you can get a free license for GUROBI or CPLEX (if their academic program still exists). With a bit of work, you could also get the optlang CBC solver interface working.

Bottom line, any mixed-integer capable solver should be able to do this in minutes rather than hours.

mfcesur commented 1 year ago

Thank you so much for your response. I performed the analysis using both GLPK and GUROBI solvers. Nevertheless, the analysis time did not considerably change.

Do you have any recommendation associated with the function parameters to increase the speed (e.g., changing the time limit)?

model_.solver = 'gurobi'

result = memote.test_model(model_, 
                           sbml_version = None, 
                           results = True, 
                           pytest_args = None, 
                           exclusive = None, 
                           skip = ("test_inconsistent_min_stoichiometry",), 
                           experimental = None, 
                           solver_timeout = 10)

resultHTML = memote.snapshot_report(result[1], config=None, html=True)

Midnighter commented 1 year ago

Okay, that's very surprising to me. There are two more things that you can try:

Just to be sure please configure Gurobi as the default solver before you even load the model.

import cobra

config = cobra.Configuration()
config.solver = "gurobi"

# Then follow up with your own code.

You already use a pretty short timeout so I don't think you can reduce that further. However, you might additionally skip the test for unconserved metabolites.

result = memote.test_model(model_, 
                       sbml_version = None, 
                       results = True, 
                       pytest_args = None, 
                       exclusive = None, 
                       skip = ("test_inconsistent_min_stoichiometry", "test_unconserved_metabolites"), 
                       experimental = None, 
                       solver_timeout = 10)

mfcesur commented 1 year ago

Thank you so much for your suggestion. The exclusion of several test functions reduced the running time at a certain level.

I should also consult another topic associated with the stoichiometric consistency. In the previous model reconstruction studies, I saw the float consistency scores between 0 and 100, but my consistency analysis results are either 0 or 100. In my analyses, the stoichiometric consistency was predicted to be zero even in the presence of a single inconsistent reaction in the model. Is it related to the parameters or MEMOTE version? Can I change this strict scoring system?

Midnighter commented 1 year ago

Might be more suitable for a discussion topic. For some time (so yes, in older versions), we used the percentage of unconserved metabolites as the score and thus it was continuous from 0 to 100. However, we both wanted to split up the test and it logically doesn't make a lot of sense. Either your metabolic network is inconsistent or not. You cannot say that your network is 20% inconsistent and that's better than another network that's 90% inconsistent. Since we cannot predict what simulations people will want to run with the models, as soon as a single inconsistent reaction is part of your network, you cannot trust your results. So it makes a lot more sense that it's an all or nothing test result now.

mfcesur commented 1 year ago

I see. Thank you very much for your reply.

mfcesur commented 1 year ago

This is my last question :) This approach is more reliable but it may cause some limitations in the reconstruction process considering the challenges in the curation of the reactions with missing metabolite formulas or ambiguous metabolite charges. Removal of all reactions that are suspected to cause inconsistency may lead to a reduction in the metabolic information in the model, which may be curated further in the next model versions via the increasing literature information.

Do you suggest to ignore the inactive reactions (according to FVA) in the MEMOTE analysis regardless of their consistency (instead of the removal of them from the model), if they cannot be curated via current knowledge?

Midnighter commented 1 year ago

You're broaching some difficult topics.

One question to answer for yourself, is it necessary to have a consistent reconstruction at every step of the model development/curation process? You could choose to add reactions and sort them out later. However, it's possible that you will end up in a situation that is very difficult to resolve.

You can also maintain two reconstructions: One that is more of a draft or an archive of reactions that you consider plausible candidates for your organism/genome of interest and another (a subset of the former) that is a strictly stoichiometrically consistent reconstruction.

Regarding consistency, I do believe being strict on the stoichiometric consistency test is a must, otherwise you violate mass conservation. At the same time, having perfect consistency on the level of chemical formulae is not always possible since we lack information.

Charge consistency is also tricky. Many existing models do not consider protons moving across membranes separately from protons that are used purely to balance reactions within the same compartment. In that case, you need to have perfectly balanced charges because otherwise you are likely generating charge gradients for free. If you do consider charge transport across membranes separately, you can actually make a case for ignoring all other protons as they will be readily provided by water. This view is supported by @eladnoor and it makes sense to me but I haven't fully worked out for myself how to accurately model this approach. I also haven't thought about it in quite a while now.

Adding blocked reactions serves no purpose at all for simulations so maybe the approach of maintaining two SBML documents is a way forward here?

mfcesur commented 1 year ago

I am so sorry for the late response. Actually, we have already reconstructed the model, but I agree with you that it is difficult to curate the reactions with missing metabolite formulas after completing the whole reconstruction process.

Thank you so much for your suggestions.

Sincerely yours, Müberra

opencobra / memote

Speed Issue for Large Models #743