monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Review and Improve cat-merge output to better support Closurizer #302

Closed putmantime closed 1 year ago

putmantime commented 2 years ago

Closurizer is currently failing in Jenkins with

poetry run ingest closure
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/tmp/workspace/all-ingests/monarch_ingest/main.py", line 102, in closure
    apply_closure()
  File "/tmp/workspace/all-ingests/monarch_ingest/cli_utils.py", line 214, in apply_closure
    add_closure(node_file=f"{name}_nodes.tsv",
  File "/tmp/workspace/all-ingests/.cache/pypoetry/virtualenvs/monarch-ingest-kct3006u-py3.8/lib/python3.8/site-packages/closurizer/closurizer.py", line 83, in add_closure
    cur.execute(f"""
sqlite3.OperationalError: near "from": syntax error
Generating closure KG...
node_file: monarch-kg_nodes.tsv
edge_file: monarch-kg_edges.tsv
kg_archive: monarch-kg.tar.gz
closure_file: data/phenio/phenio-relations-non-redundant.tsv
fields: subject,object
output_file: monarch-kg-with-closure_edges.tsv

This looks like an sqlite error, but it's likely about files not being unpacked from the tar.gz into the right place. We should review from the top level how we handle the tarball that comes out of cat merge vs the individual node and edge files that we need in the steps that happen after - then we can make the tar archive just before releasing