ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
76 stars 34 forks source link

Issue 423: Add JSON-LD context dictionary generator #453

Open kfairbanks opened 2 years ago

kfairbanks commented 2 years ago

This Pull Request resolves all requirements of Issue #423 .

Coordination

ajnelson-nist commented 2 years ago

@kfairbanks - I inlined a TODO that you might need to address, about keying some dicts. I'm not sure; your dictionary generator seems to be working well enough now w.r.t. collecting properties.

Your method for building the namespace prefixes looks like it would benefit from a stably-available prefix definition, rather than needing to crawl through /ontology. (At the time you call the generators, uco_monolithic.ttl happens to guarantee to have been built, which would also lend to saving you from a os.walk().) It's probably worth the committee considering using sh:declares or skos:notation to house authoritative prefixes.

All of the ontologies under /ontology/uco/ currently have an rdfs:label you could use. (This turns out to be sufficient coverage, because ontologies under the other /ontology directories don't define things that need to go into a context dictionary.) But, semantics for rdfs:label don't necessarily encourage the kind of consumption and availability-guarantee that you'd want. If you'd like to help justify prefixes and shave away ~50 lines of code, feel free to take inspiration from this SPARQL query:

SELECT ?nOntology ?lLabel
WHERE {
  ?nOntology a owl:Ontology .
  OPTIONAL {
    ?nOntology rdfs:label ?lLabel .
  }
}
ORDER BY ?nOntology

Meanwhile: I've added a test for the "Concise" dictionary form. It currently fails, and reviewing the generated file, I see the issue is that classes are not being generated for either dictionary form. I was surprised that the "Minimal" JSON-LD sample was working with both the "minimal" and "concise" context dictionaries. The reason that worked was that the classes that weren't in the context dictionary were inlined in each of the nodes.

I also added tests for whether UCO properties and datatype annotations (UCO-sourced and XSD-sourced) were getting picked up by the dictionaries, by adapting the tests/examples/hash_PASS.json file. That file's small enough it wasn't too painful to add a full IRI expansion of everything and do an exact set-of-triples comparison. The verbose pytest output is pretty instructive - run make check and you'll see.

The last test, which I'm not sure can be run from pyshacl's command line interface, is SHACL validation using a context dictionary. It might be necessary to consider SHACL validation out of scope of this test suite. (case_validate can add a flag to load some context dictionary, but for a test to run in UCO, something would probably have to be written, re-implementing a chunk of case_validate for the sake of the test suite only.)

Next steps: Please get make check to pass; it'll be up to the committee to handle the prefixes question. Getting there might be all that's necessary.

ajnelson-nist commented 2 years ago

Converting to Draft until Solutions Approval vote is logged.