Closed carrascomj closed 4 years ago
Merging #678 into develop will decrease coverage by
0.06%
. The diff coverage is87.5%
.
@@ Coverage Diff @@
## develop #678 +/- ##
===========================================
- Coverage 82.82% 82.76% -0.07%
===========================================
Files 49 49
Lines 2608 2622 +14
Branches 427 428 +1
===========================================
+ Hits 2160 2170 +10
- Misses 357 359 +2
- Partials 91 93 +2
Impacted Files | Coverage Δ | |
---|---|---|
src/memote/support/consistency.py | 94.62% <87.5%> (-1.89%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f30c1b1...d1c5e16. Read the comment docs.
I like your changes especially the use of mutliprocessing and manipulating the underlying model directly.
I'd be interested to see how much faster this is. Would you mind providing a before/after comparison?
And, of course, I'd feel more comfortable merging it if the test wouldn't fail.
The tests should pass now. I have the comparison in a notebook, but apparently GitHub doesn't support the format. I'll try to paste the code and the results.
The tests were performed on a small model in BiGG.
![ ! -f "iAB_RBC_283.xml" ] && curl -L -O 'http://bigg.ucsd.edu/static/models/iAB_RBC_283.xml'
VERBOSITY = False
PROCESSES = 4
Old functions:
from time import time
import cobra
import numpy as np
from copy import deepcopy
from cobra.exceptions import Infeasible
from tqdm import tqdm
import sys
def run_fba(model, rxn_id, direction="max", single_value=True):
model.objective = model.reactions.get_by_id(rxn_id)
model.objective_direction = direction
if single_value:
return model.slim_optimize()
else:
try:
solution = model.optimize()
except Infeasible:
return solution
return np.nan
def open_exchanges(model):
for rxn in model.exchanges:
rxn.bounds = (-1000, 1000)
def test_old(model):
"""
old_test in consistency.py
"""
mets_not_produced = list()
open_exchanges(model)
pbar = tqdm(total=len(model.metabolites))
for met in model.metabolites:
with model:
exch = model.add_boundary(
met, type="irrex", reaction_id="IRREX", lb=0, ub=1000
)
solution = run_fba(model, exch.id)
if np.isnan(solution) or solution < model.tolerance:
mets_not_produced.append(met)
pbar.update(1)
return mets_not_produced
New logic functions:
def solve_boundary(metabolite, rxn, val=-1):
"""
Solves the model when some reaction `rxn` has been added to the `metabolite`'s contraints.
"""
metabolite.constraint.set_linear_coefficients({rxn: val})
solution = metabolite.model.slim_optimize()
# TODO: it seems like with context doesn't catch these changes, need to check
# restore constraint
metabolite.constraint.set_linear_coefficients({rxn: 0})
return solution
def test_new(model):
"""
New test
"""
mets_not_produced = list()
open_exchanges(model)
irr = model.problem.Variable("irr", lb=0, ub=1000)
with model:
model.add_cons_vars(irr)
# helper.run_fba() only accepts reactions in the model
model.objective = irr
pbar = tqdm(total=len(model.metabolites))
for met in model.metabolites:
solution = solve_boundary(met, irr)
if np.isnan(solution) or solution < model.tolerance:
mets_not_produced.append(met)
pbar.update(1)
return mets_not_produced
Comparison:
model = cobra.io.read_sbml_model("iAB_RBC_283.xml")
model.solver = "glpk"
if VERBOSITY:
model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
verbosity=3
)
print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set(test_old(model))
print(
f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
start = time()
new = set(test_new(model))
print(
f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
print(old ^ new)
assert old == new
Number of metabolites in the model: 342
Identified metabolites by the old version: 22 in 2.8081214427948 s
Identified metabolites by the new version: 22 in 2.1633551120758057 s
set()
It's not a huge difference; although profiling revealed the expected behavior.
Functions for multiprocessing:
import multiprocessing
def _init_worker(model, irr, val):
"""Initialize a global model object for multiprocessing.
Parameters
----------
model : cobra.Model
The metabolic model under investigation.
irr: optlang.Variable || cobra.Reaction
the reaction to be added to the linear coefficients. It must be in the
variables of the model.
val: int
value of the linear coefficient (1 for consumption, -1 for production)
"""
global _model
global _irr
global _val
_model = model
_model.objective = irr
_irr = irr
_val = val
def _solve_metabolite_production(metabolite):
"""
Solves the model when some reaction has been added to a `metabolite`'s
contraints. The reaction and the model are passed as globals.
Parameters
----------
metabolite: cobra.Metabolite
the reaction will be added to this metabolite as a linear coefficient
Returns
-------
solution: float
the value of the solution of the LP problem, *NaN* if infeasible.
metabolite: cobra.Metabolite
metabolite passed as argument (to use map as a filter)
"""
global _model
global _irr
global _val
constraint = _model.metabolites.get_by_id(metabolite.id).constraint
constraint.set_linear_coefficients({_irr: _val})
solution = _model.slim_optimize()
constraint.set_linear_coefficients({_irr: 0})
return solution, metabolite
def find_metabolites_not_produced_with_open_bounds(model, processes=None, prod = True):
"""
Return metabolites that cannot be produced with open exchange reactions.
A demand reaction is set as the objective. Then, it is sequentally added as
a coefficient for every metabolite and the solution is inspected.
A perfect model should be able to produce each and every metabolite when
all medium components are available.
Parameters
----------
model : cobra.Model
The metabolic model under investigation.
processes: int
Number of processes to be used (Default to `cobra.Configuration()`).
prod: bool
If False, it checks for consumption instead of production. Default True
Returns
-------
list
Those metabolites that could not be produced.
"""
if processes is None:
# For now, borrow the number of processes from cobra's configuration
processes = Configuration().processes
n_mets = len(model.metabolites)
processes = min(processes, n_mets)
# manage the value of the linear coefficient to be added to each metabolite
val = -1 # production
if not prod:
val = 1 # consumption
open_exchanges(model)
irr = model.problem.Variable("irr", lb=0, ub=1000)
if processes > 1:
chunk_s = n_mets // processes
pool = multiprocessing.Pool(
processes,
initializer=_init_worker,
initargs=(model, irr, val),
)
# use map as filter
mets_not_produced = [met for solution, met in pool.imap_unordered(
_solve_metabolite_production, model.metabolites, chunksize=chunk_s
) if np.isnan(solution) or solution < model.tolerance]
pool.close()
pool.join()
else:
_init_worker(model, irr)
# use map as filter
mets_not_produced = [met for solution, met in map(
_solve_metabolite_production, model.metabolites
) if np.isnan(solution) or solution < model.tolerance]
return mets_not_produced
Comparison:
if VERBOSITY:
model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
verbosity=3
)
print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set([met.id for met in test_old(model)])
print(
f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
start = time()
new = set([met.id for met in find_metabolites_not_produced_with_open_bounds(model, PROCESSES)])
print(
f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
print(old ^ new)
assert old == new
Number of metabolites in the model: 342
Identified metabolites by the old version: 22 in 2.826913356781006 s
Identified metabolites by the new version: 22 in 0.7438373565673828 s
set()
The opposite test should also incorporate the new logic and the multiprocessing feature. The corresponding tests:
def consumed_old(model):
"""
Return metabolites that cannot be consumed with open boundary reactions.
When all metabolites can be secreted, it should be possible for each and
every metabolite to be consumed in some form.
Parameters
----------
model : cobra.Model
The metabolic model under investigation.
Returns
-------
list
Those metabolites that could not be consumed.
"""
mets_not_consumed = list()
open_exchanges(model)
for met in model.metabolites:
with model:
exch = model.add_boundary(
met, type="irrex", reaction_id="IRREX", lb=-1000, ub=0)
solution = run_fba(model, exch.id, direction="min")
if np.isnan(solution) or abs(solution) < model.tolerance:
mets_not_consumed.append(met)
return mets_not_consumed
def consumed_new(model, processes=None):
return find_metabolites_not_produced_with_open_bounds(model, processes=processes, prod=False)
Comparison:
if VERBOSITY:
model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
verbosity=3
)
print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set(consumed_old(model))
print(
f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
old = set([met.id for met in old])
start = time()
new = set(consumed_new(model, PROCESSES))
print(
f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
new = set([met.id for met in new])
print(old ^ new)
assert old == new
Number of metabolites in the model: 342
Identified metabolites by the old version: 20 in 2.8499839305877686 s
Identified metabolites by the new version: 20 in 0.8409318923950195 s
set()
Oh before I merge could I get you to add an entry to the History.rst briefly describing the changes you made?
OK, I'll add an entry as soon as possible!
In order to speed up the consistency tests of checking metabolite production and consumption with open bounds, the logic of the test has been changed. Instead of adding a boundary to the model each time, the test now works by sequentially changing the linear coefficients of the metabolites, which should be faster.
▪ get the linear coefficients of the constraint of each metabolite;
▪ add a coefficient to the variable in the constraint (-1 for production and 1 for consumption) ▪ solve;
▪ restore constraint;
▪ If not solution, add the metabolite to the returned output.
Multiprocessing was also added to decrease the running time.