Open PathogeNish opened 8 months ago
Yeah good idea. I think it makes sense to allow that the medium
arg is a list in grow
.
There is a way to make this work right now if you exploit the sample_specific media and are fine with storing the models multiple times. The trick is to create one sample with a unique id for each medium and have a matching sample_id
column in your medium.
Here is an example with the test data included in MICOM. We will simulate an E. coli community with a normal and anaerobic medium.
import micom as mm
from micom.workflows import build, grow
import pandas as pd
tax = mm.data.test_data(1) # only one sample
tax["sample_id"] = "aerobic"
tax_noo2 = tax.copy()
tax_noo2["sample_id"] = "anaerobic"
multi_tax = pd.concat([tax, tax_noo2])
# So we now have a table with the same 3 taxa in two sample ids and build those models
manifest = build(multi_tax, mm.data.test_db, "models")
# Now we build a sample-specific medium
med = mm.qiime_formats.load_qiime_medium(mm.data.test_medium)
med_noo2 = med[med.reaction != "EX_o2_m"].copy()
med["sample_id"] = "aerobic"
med_noo2["sample_id"] = "anaerobic"
multi_med = pd.concat([med, med_noo2])
# The rest is straight-forward
# This will be run in parallel and will make sure to free the RAM
results = grow(manifest, "models", multi_med, 1.0, threads=2)
# And let's have a look
print(results.growth_rates)
Which gives you something like this:
abundance growth_rate reactions metabolites taxon tradeoff sample_id
compartments
Escherichia_coli_2 0.609524 0.235910 95 72 Escherichia_coli_2 1.0 anaerobic
Escherichia_coli_3 0.347253 0.134401 95 72 Escherichia_coli_3 1.0 anaerobic
Escherichia_coli_4 0.043223 0.016729 95 72 Escherichia_coli_4 1.0 anaerobic
Escherichia_coli_2 0.609524 0.987139 95 72 Escherichia_coli_2 1.0 aerobic
Escherichia_coli_3 0.347253 0.562384 95 72 Escherichia_coli_3 1.0 aerobic
Escherichia_coli_4 0.043223 0.070001 95 72 Escherichia_coli_4 1.0 aerobic
But, that is pretty inefficient in terms of storage.
Hi @cdiener. This is interesting! I didn't know one could give specific tags to media as well!
Yeah... the process is quite lossy because one is forced to create multiple duplicate models. But it is better than the alternative multiprocessing wrapper I wrote because of the inexplicable ram buildup (likely due to termination issues with the created threads).
The RAM buildup is because of https://github.com/opencobra/cobrapy/issues/568 which affects multiple solvers. You can get around that my setting maxtasksperchild
to something small (preferably 1) in multiprocessing.Pool
. Or you can use the MICOM workflow wrapper which already does that and avoids some other problems as well.
Checklist
Is your feature related to a problem? Please describe it.
I want to analyse the microbial growth rates for a given microbiome profile (single sample) when run on multiple different diets (or media). The idea is to understand how individual growth rates are impacted with changes in the medium. Multiple samples run on the same medium or
grow
with the appropriate models has in-built multithreading to take advantage of a multi-core cpu. But growth of a single sample on multiple media doesn't have a multi threading option.Describe the solution you would like.
Given that a single sample
grow
uses only 1 thread, I want to be able to run growth calculations on different media (100-200 types of media) by using multithreading to access all threads simultaneously.Describe alternatives you considered
I currently use the
pool.starmap
function in the standard python multiprocessing library. I am facing an issue of memory pile up (likely due to threads that are not deallocated/ terminated despite the completion of the calculation) that causes the program to crash due to heavy ram usage.