Best practices for memote usage on cluster

opencobra / memote

memote – the genome-scale metabolic model test suite

https://memote.readthedocs.io/

Apache License 2.0

126 stars 26 forks source link

Best practices for memote usage on cluster #694

Closed franciscozorrilla closed 4 years ago

franciscozorrilla commented 4 years ago

This is more of a question than a problem, as I can't seem to find any information about this in the documentation or issues. When submitting memote jobs on the cluster, can memote make use of multiple cores/threads? There does not seem to be a --cores or --threads parameter that I am aware of, does memote check and use the max number of cores available or am I wasting resources if I allocate more than one core per job?

Midnighter commented 4 years ago

When I ran memote on the cluster for the paper, I used single cores.

The longer answer is that certain tests that use FVA, for example, will use multiple cores by default based on the cobrapy configuration. Certain solvers (like Gurobi) may also attempt to solve problems in parallel if multiple cores are available.

In principle, all tests are independent and you should be able to run all tests in a distributed manner (with something like pytest-xdist). However, the way that we collect the test results is not multiprocess capable. Considering that there are only a handful or tests that take a really long time, it didn't seem worth the effort to try and implement that.

franciscozorrilla commented 4 years ago

That makes sense, thanks for the info!

franciscozorrilla commented 4 years ago

As a broader follow up on best practices for cluster usage, if I want to compare hundreds or even thousands of models, would you recommend each model be submitted individually using something like memote run model.xml and then extract/compile results into a table, or would a more efficient way be to submit them as a single job with something like memote report diff *.xml?

Midnighter commented 4 years ago

I definitely recommend to submit individual jobs, collect all the JSON reports from memote run, and then reporting aggregates. Please take a look at the following repo that contains the analysis code and supplements for the memote paper. It also contains the results for many existing models using the memote version at that point in time.

https://github.com/biosustain/memote-meta-study

memote report difffirst of all doesn't save the results for you and is also probably illegible beyond a handful of models.