monarch-initiative / monarch-mapping-commons

Building a fully exectuable workflow for boomer
Creative Commons Zero v1.0 Universal
4 stars 3 forks source link

Update Mondo mappings and add gene mappings #23

Closed glass-ships closed 1 year ago

glass-ships commented 1 year ago

This will require setting up git LFS for gene mappings, as it exceeds github's 100MB file size limit

(If we're going to remove boomer output, we may also want to run BFG on the repo to reduce the size of the repo's history containing all the boomer output as well)

glass-ships commented 1 year ago

Woops, duplicate of #6 no it's not. sorry for my confusion

glass-ships commented 1 year ago

@kevinschaper or @matentzn do you have a suggestion for the mapping_set_id for the monarch gene mappings, it looks like the standard is a purl or w3id url, but i'm not sure if we have one yet for gene mappings

glass-ships commented 1 year ago

I'm also running into some error but it's not clear immediately what the issue is:

Loading file:empty.sssom.tsv 
Traceback (most recent call last):
  File "/work/scripts/gen_boomer_input.py", line 136, in <module>
    cli()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/work/scripts/gen_boomer_input.py", line 113, in run
    msdf = parse_sssom_table(fp)
  File "/usr/local/lib/python3.10/dist-packages/sssom/parsers.py", line 290, in parse_sssom_table
    msdf = from_sssom_dataframe(df, prefix_map=meta_all.prefix_map, meta=meta_all.metadata)
  File "/usr/local/lib/python3.10/dist-packages/sssom/parsers.py", line 485, in from_sssom_dataframe
    mlist.append(_prepare_mapping(Mapping(**mdict)))
  File "<string>", line 47, in __init__
  File "/usr/local/lib/python3.10/dist-packages/sssom_schema/datamodel/sssom_schema.py", line 278, in __post_init__
    self.MissingRequiredField("mapping_justification")
  File "/usr/local/lib/python3.10/dist-packages/linkml_runtime/utils/yamlutils.py", line 273, in MissingRequiredField
    raise ValueError(f"{field_name} must be supplied")
ValueError: mapping_justification must be supplied
make[1]: *** [Makefile:103: boomer_input/no_mappings/combined.sssom.tsv] Error 1
make[1]: Leaving directory '/work/projects/mondo-all'
make: *** [Makefile:26: symbiont-mondo-all] Error 2

EDIT: I made the mistake of updating SSSOM based on some supposedly very important updates, and mapping_justification is now a required field - i ran into the same issue with the process_biomappings script. Will reference that and see if i can't figure it out, but if it's an empty sssom file...

matentzn commented 1 year ago

mapping_justification was always required - by the spec :) Any version of sssom-py would have had that issue, or at least, should have. You can just add a single column to the file called mapping_justification where all the values are semapv:UnspecifiedMatching.

glass-ships commented 1 year ago

gotcha. it looks like multiple files are missing this column, i'll go through and add them where i can and assume semapv:UnspecifiedMatching

glass-ships commented 1 year ago

actually i've been rummaging through this for a while, and it's extremely unclear how the gen_boomer_input.py script is being used or which files need to be edited to include mapping_justification. any chance I could get pointed in the right direction? not sure who even understands this repo at this point

glass-ships commented 1 year ago

addressing in #25