opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
123 stars 26 forks source link

Vague Error if Gene in Essentiality Data but not Model #707

Open jonstrutz11 opened 3 years ago

jonstrutz11 commented 3 years ago

I want to test a model against essentiality data. I have data for nearly all genes in this organism, many of which are not included in the model. I expected that Memote would just ignore any genes in the dataset that were not in the model, or at least raise an error. However, if you include genes not in the model, this is the result:

Traceback (most recent call last):
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\jonst\Anaconda3\envs\gemulate\Scripts\memote.exe\__main__.py", line 7, in <module>
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\memote\suite\cli\reports.py", line 121, in snapshot
    solver_timeout=solver_timeout)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\memote\suite\api.py", line 119, in test_model
    experimental.load(model)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\memote\experimental\config.py", line 79, in load
    self.load_essentiality(model)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\memote\experimental\config.py", line 143, in load_essentiality
    experiment.validate(model)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\memote\experimental\essentiality.py", line 80, in validate
    model=model, checks=checks + custom)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\memote\experimental\experimental_base.py", line 101, in validate
    order_fields=True, checks=checks))
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\validate.py", line 80, in validate
    report = inspector.inspect(source, **options)
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\inspector.py", line 83, in inspect
    table_warnings, table_report = task.get()
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\multiprocessing\pool.py", line 644, in get

    raise self._value
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\inspector.py", line 304, in __inspect_table
    'errors': [dict(error) for error in errors],
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\inspector.py", line 304, in <listcomp>
    'errors': [dict(error) for error in errors],
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\error.py", line 51, in __iter__
    for key, value in self._to_dict().items():
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\error.py", line 116, in _to_dict
    'message': self.message,
  File "c:\users\jonst\anaconda3\envs\gemulate\lib\site-packages\goodtables\error.py", line 76, in message
    **self._message_substitutions
KeyError: 'header'

Because this is quite vague, it took me a few hours to realize that this was because I had extra genes in my essentiality csv file that were not in the model. That said, I think this should be handled more elegantly to save people in the future time, since I'm sure I'm not the only one who will/has encountered this issue.

Code Sample

data.zip

See the above zip folder for a minimal reproducible example. Unzip it and then run with command: memote report snapshot --filename report.html --experimental data\experiments.yml data\toy_model.yml

It should run normally. To reproduce the error, just add a line to knockouts.csv (e.g. "Gene3,no,") for a gene not in the model (there are only two genes in this toy model). Then rerun the above command. You should see the above error message.

Midnighter commented 3 years ago

Hi @jonstrutz11,

Thank you for the report and sorry for the trouble. I will look at your data but from the error it seems to me that you might be missing the header line? It is actually the intended behaviour that unknown genes will be ignored.

jonstrutz11 commented 3 years ago

That is what I thought too initially, but if you run the data, you will see (I think) that that wasn't actually the issue, since the header is included in the correct format (with "gene", "essential", "comment" headers). That is actually the main reason why this took me so long to debug 😬