opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
123 stars 26 forks source link

fix: change logic of test production and consumption metabolites to speed up the process #678

Closed carrascomj closed 4 years ago

carrascomj commented 4 years ago

In order to speed up the consistency tests of checking metabolite production and consumption with open bounds, the logic of the test has been changed. Instead of adding a boundary to the model each time, the test now works by sequentially changing the linear coefficients of the metabolites, which should be faster.

Multiprocessing was also added to decrease the running time.

codecov[bot] commented 4 years ago

Codecov Report

Merging #678 into develop will decrease coverage by 0.06%. The diff coverage is 87.5%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #678      +/-   ##
===========================================
- Coverage    82.82%   82.76%   -0.07%     
===========================================
  Files           49       49              
  Lines         2608     2622      +14     
  Branches       427      428       +1     
===========================================
+ Hits          2160     2170      +10     
- Misses         357      359       +2     
- Partials        91       93       +2
Impacted Files Coverage Δ
src/memote/support/consistency.py 94.62% <87.5%> (-1.89%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f30c1b1...d1c5e16. Read the comment docs.

ChristianLieven commented 4 years ago

I like your changes especially the use of mutliprocessing and manipulating the underlying model directly.

I'd be interested to see how much faster this is. Would you mind providing a before/after comparison?

And, of course, I'd feel more comfortable merging it if the test wouldn't fail.

carrascomj commented 4 years ago

The tests should pass now. I have the comparison in a notebook, but apparently GitHub doesn't support the format. I'll try to paste the code and the results.

carrascomj commented 4 years ago

The tests were performed on a small model in BiGG.

![ ! -f "iAB_RBC_283.xml" ] && curl -L -O 'http://bigg.ucsd.edu/static/models/iAB_RBC_283.xml'
VERBOSITY = False
PROCESSES = 4

1. Test change in logic

Old functions:

from time import time

import cobra
import numpy as np
from copy import deepcopy
from cobra.exceptions import Infeasible
from tqdm import tqdm
import sys

def run_fba(model, rxn_id, direction="max", single_value=True):
    model.objective = model.reactions.get_by_id(rxn_id)
    model.objective_direction = direction
    if single_value:
        return model.slim_optimize()
    else:
        try:
            solution = model.optimize()
        except Infeasible:
            return solution
            return np.nan

def open_exchanges(model):
    for rxn in model.exchanges:
        rxn.bounds = (-1000, 1000)

def test_old(model):
    """
    old_test in consistency.py
    """
    mets_not_produced = list()
    open_exchanges(model)
    pbar = tqdm(total=len(model.metabolites))
    for met in model.metabolites:
        with model:
            exch = model.add_boundary(
                met, type="irrex", reaction_id="IRREX", lb=0, ub=1000
            )
            solution = run_fba(model, exch.id)
            if np.isnan(solution) or solution < model.tolerance:
                mets_not_produced.append(met)
        pbar.update(1)
    return mets_not_produced

New logic functions:

def solve_boundary(metabolite, rxn, val=-1):
    """
    Solves the model when some reaction `rxn` has been added to the `metabolite`'s contraints.
    """
    metabolite.constraint.set_linear_coefficients({rxn: val})
    solution = metabolite.model.slim_optimize()
    # TODO: it seems like with context doesn't catch these changes, need to check
    # restore constraint
    metabolite.constraint.set_linear_coefficients({rxn: 0})
    return solution

def test_new(model):
    """
    New test
    """
    mets_not_produced = list()
    open_exchanges(model)
    irr = model.problem.Variable("irr", lb=0, ub=1000)
    with model:
        model.add_cons_vars(irr)
        # helper.run_fba() only accepts reactions in the model
        model.objective = irr
        pbar = tqdm(total=len(model.metabolites))
        for met in model.metabolites:
            solution = solve_boundary(met, irr)
            if np.isnan(solution) or solution < model.tolerance:
                mets_not_produced.append(met)
            pbar.update(1)
    return mets_not_produced

Comparison:

model = cobra.io.read_sbml_model("iAB_RBC_283.xml")
model.solver = "glpk"
if VERBOSITY:
    model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
        verbosity=3
    )

print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set(test_old(model))
print(
    f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)

start = time()
new = set(test_new(model))
print(
    f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
print(old ^ new)

assert old == new

Results

Number of metabolites in the model: 342
Identified metabolites by the old version: 22 in 2.8081214427948 s
Identified metabolites by the new version: 22 in 2.1633551120758057 s
set()

It's not a huge difference; although profiling revealed the expected behavior.

2. Multiprocessing

Functions for multiprocessing:

import multiprocessing

def _init_worker(model, irr, val):
    """Initialize a global model object for multiprocessing.

    Parameters
    ----------
    model : cobra.Model
        The metabolic model under investigation.
    irr: optlang.Variable || cobra.Reaction
        the reaction to be added to the linear coefficients. It must be in the
        variables of the model.
    val: int
        value of the linear coefficient (1 for consumption, -1 for production)
    """
    global _model
    global _irr
    global _val
    _model = model
    _model.objective = irr
    _irr = irr
    _val = val

def _solve_metabolite_production(metabolite):
    """
    Solves the model when some reaction has been added to a `metabolite`'s
    contraints. The reaction and the model are passed as globals.

    Parameters
    ----------
    metabolite: cobra.Metabolite
        the reaction will be added to this metabolite as a linear coefficient

    Returns
    -------
    solution: float
        the value of the solution of the LP problem, *NaN* if infeasible.
    metabolite: cobra.Metabolite
        metabolite passed as argument (to use map as a filter)
    """
    global _model
    global _irr
    global _val
    constraint = _model.metabolites.get_by_id(metabolite.id).constraint
    constraint.set_linear_coefficients({_irr: _val})
    solution = _model.slim_optimize()
    constraint.set_linear_coefficients({_irr: 0})
    return solution, metabolite

def find_metabolites_not_produced_with_open_bounds(model, processes=None, prod = True):
    """
    Return metabolites that cannot be produced with open exchange reactions.

    A demand reaction is set as the objective. Then, it is sequentally added as
    a coefficient for every metabolite and the solution is inspected.

    A perfect model should be able to produce each and every metabolite when
    all medium components are available.

    Parameters
    ----------
    model : cobra.Model
        The metabolic model under investigation.
    processes: int
        Number of processes to be used (Default to `cobra.Configuration()`).
    prod: bool
        If False, it checks for consumption instead of production. Default True

    Returns
    -------
    list
        Those metabolites that could not be produced.

    """
    if processes is None:
        # For now, borrow the number of processes from cobra's configuration
        processes = Configuration().processes
    n_mets = len(model.metabolites)
    processes = min(processes, n_mets)
    # manage the value of the linear coefficient to be added to each metabolite
    val = -1 # production
    if not prod:
        val = 1 # consumption
    open_exchanges(model)
    irr = model.problem.Variable("irr", lb=0, ub=1000)

    if processes > 1:
        chunk_s = n_mets // processes
        pool = multiprocessing.Pool(
            processes,
            initializer=_init_worker,
            initargs=(model, irr, val),
        )
        # use map as filter
        mets_not_produced = [met for solution, met in pool.imap_unordered(
            _solve_metabolite_production, model.metabolites, chunksize=chunk_s
        ) if np.isnan(solution) or solution < model.tolerance]
        pool.close()
        pool.join()
    else:
        _init_worker(model, irr)
        # use map as filter
        mets_not_produced = [met for solution, met in map(
            _solve_metabolite_production, model.metabolites
        ) if np.isnan(solution) or solution < model.tolerance]
    return mets_not_produced

Comparison:

if VERBOSITY:
    model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
        verbosity=3
    )

print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set([met.id for met in test_old(model)])
print(
    f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)

start = time()
new = set([met.id for met in find_metabolites_not_produced_with_open_bounds(model, PROCESSES)])
print(
    f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
print(old ^ new)

assert old == new

Results

Number of metabolites in the model: 342
Identified metabolites by the old version: 22 in 2.826913356781006 s
Identified metabolites by the new version: 22 in 0.7438373565673828 s
set()

3. Test for consumption

The opposite test should also incorporate the new logic and the multiprocessing feature. The corresponding tests:

def consumed_old(model):
    """
    Return metabolites that cannot be consumed with open boundary reactions.
    When all metabolites can be secreted, it should be possible for each and
    every metabolite to be consumed in some form.
    Parameters
    ----------
    model : cobra.Model
        The metabolic model under investigation.
    Returns
    -------
    list
        Those metabolites that could not be consumed.
    """
    mets_not_consumed = list()
    open_exchanges(model)
    for met in model.metabolites:
        with model:
            exch = model.add_boundary(
                met, type="irrex", reaction_id="IRREX", lb=-1000, ub=0)
            solution = run_fba(model, exch.id, direction="min")
            if np.isnan(solution) or abs(solution) < model.tolerance:
                mets_not_consumed.append(met)
    return mets_not_consumed

def consumed_new(model, processes=None):
    return find_metabolites_not_produced_with_open_bounds(model, processes=processes, prod=False)

Comparison:

if VERBOSITY:
    model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
        verbosity=3
    )

print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set(consumed_old(model))
print(
    f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
old = set([met.id for met in old])

start = time()
new = set(consumed_new(model, PROCESSES))
print(
    f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
new = set([met.id for met in new])
print(old ^ new)

assert old == new

Results

Number of metabolites in the model: 342
Identified metabolites by the old version: 20 in 2.8499839305877686 s
Identified metabolites by the new version: 20 in 0.8409318923950195 s
set()
ChristianLieven commented 4 years ago

Oh before I merge could I get you to add an entry to the History.rst briefly describing the changes you made?

carrascomj commented 4 years ago

OK, I'll add an entry as soon as possible!