mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
49 stars 12 forks source link

Improve testing of round trip serialization #480

Closed hrshdhgd closed 7 months ago

hrshdhgd commented 8 months ago

This fixes: https://github.com/mapping-commons/sssom/issues/321

So the issue was when a tsv file is sssom converted to a JSON file and then the same JSON file is sssom parsed back, there was an error thrown :

ValueError: ['Some prefixes'] are used in the SSSOM mapping set but it does not exist in the prefix map

This happened because the @context block of the JSON file which contained the prefix map was never considered while forming the prefix_map and the metadata items (mapping_set_id, license etc) were also ignored. In addition to this, an absence of the external metadata file path param made it get the default metadata and prefix map internally which is a clear bug in the system.

This PR resolves that issue.

cthoyt commented 8 months ago

I am pretty skeptical of this. Can you make a 100% explicit test that shows what is going on here? Loading these opaque files makes it difficult to review this PR

matentzn commented 8 months ago

@cthoyt I have now updated the code to reflect your design in the parse_sssom_table method. As we wont have Harshad for a while, and I will have to deal with maintaining SSSOM for a bit, if you can find a minute to see if what I am doing makes sense, that would be much appreciated.

cthoyt commented 7 months ago

@matentzn current issues are now due to data quality with the test data. I can't address further since it's going to require debugging some mismatch between the RDF and the LinkML schema

INFO:root:Unmapped predicated: {rdflib.term.URIRef('https://w3id.org/sssom/comment'), rdflib.term.URIRef('https://w3id.org/sssom/creator_id'), rdflib.term.URIRef('https://w3id.org/sssom/license'), rdflib.term.URIRef('https://w3id.org/sssom/mapping_date')}
INFO:root:Triple processed = 2459, unprocessed = 86

Ran 1 test in 6.724s

FAILED (errors=1)

Error
Traceback (most recent call last):
  File "/Users/cthoyt/dev/sssom-py/tests/test_parsers.py", line 250, in test_parse_sssom_rdf
    msdf = from_sssom_rdf(g=self.rdf_graph, prefix_map=self.df_converter, meta=self.metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/dev/sssom-py/src/sssom/parsers.py", line 451, in from_sssom_rdf
    RDFLibLoader().load(
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 256, in load
    objs = self.from_rdf_graph(g, schemaview=schemaview, target_class=target_class, prefix_map=prefix_map, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 205, in from_rdf_graph
    return [target_class(**x) for x in root_dicts]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 205, in <listcomp>
    return [target_class(**x) for x in root_dicts]
            ^^^^^^^^^^^^^^^^^
  File "<string>", line 31, in __init__
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/sssom_schema/datamodel/sssom_schema.py", line 110, in __post_init__
    self.MissingRequiredField("license")
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/utils/yamlutils.py", line 273, in MissingRequiredField
    raise ValueError(f"{field_name} must be supplied")
ValueError: license must be supplied
matentzn commented 7 months ago

@cthoyt thank you for your help on this; I fixed the remaining test and now everything passes. If you don't see any further issue, I think this is ready to go!