opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
125 stars 26 forks source link

[feature] batch tests and +2 comparative reports #231

Closed ChristianLieven closed 6 years ago

ChristianLieven commented 6 years ago

Problem description

Running a benchmark for several models at once could be useful. Thinking for instance of the work of John Monk with the many Ecolis (several would fail on the default biomass as they are auxotrophic, again) and output the results for the set, still keeping the Fail/Pass for each model. For instance, for me now to run it on my 127 models it's easy but then I'll have to look at each html output or parse them all into my own database to get a broad overview of all.

  • Joana Xavier, Correspondence
ChristianLieven commented 6 years ago

People who'd like to run memote on a bunch of models to recieve a simple table output could try this in a jupyter notebook:

from memote.suite.api import test_model
import pandas as pd
from future.types.newdict import newdict

# Automatically load your models into a list here.
model_list = [model]

for counter, model in enumerate(model_list):

# This is the API command that runs the test
    report_dictionary = memote.test_model(model, results=True)[1]['report']

    super_categories = []
    test_ids = []
    final_results = []

    for i in report_dictionary.keys():
        for j in report_dictionary[i].keys():
            test_ids.append(j)
            result = report_dictionary[i][j]
            super_categories.append(i)
# Some results are currently stored in long lists containing several datatypes
# For readability we only display the lenght of lists larger than 2 items.
            if type(result) == list and len(result) > 2:
                final_results.append(len(result))
# Some results are actually pandas.dataframes exported as dictionaries.
# Here we just calculate a summary of the values from these dictionaries.
            elif type(result) == newdict:
                summary_dict = {x: sum(1 for b in y if not b) for x, y in
                                result.items() if x != 'index'}
                final_results.append(summary_dict)
            else:
                final_results.append(result)

# With the first model we generate the structure of the dataframe
    if counter == 0:
        data = {'Category': pd.Series(super_categories),
                'Test ID': pd.Series(test_ids),
                '{}'.format(model.id): pd.Series(final_results)}
        df = pd.DataFrame(data)
# For every additional model we'll add a column
    else:
        df['{}'.format(model.id)] = pd.Series(final_results)

# With the dataframe you can now group/sort etc to analyse the models' performance
df
ChristianLieven commented 6 years ago

The above code maybe outdated but in theory that is how one could tackle the issue.

The API facilitates automatic testing of multiple metabolic models. For now we've decided against making a large report that can accommodate multiple models as we believe that users with that request ought to be well served by having a functional API. A large comprehensive report like this is out of the scope of memote.