ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
76 stars 34 forks source link

UCO support for Protégé #449

Closed ajnelson-nist closed 1 year ago

ajnelson-nist commented 2 years ago

Disclaimer

Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

Background

Protégé is a tool for editing ontologies.

https://protege.stanford.edu/

It is able to open and interact with ontologies stored as local files in a user's desktop environment. It is also capable of resolving all of the ontology imports (encoded as owl:import statements), recursively retrieving all ontologies referenced from any loaded ontology.

By default, Protégé will do an over-the-wire retrieval on encountering an owl:imports statement: The referenced IRI will be downloaded in whatever RDF serialization is offered (seemingly preferring application/rdf+xml). It is possible to include an "Override" XML file that can be interpreted as: "Whenever Protégé encounters this IRI, instead of loading a file from a network retrieval, load a file from this hard-coded relative or absolute path."

There is a slight technical matter with the XML file: It must reside in the same directory as the ontology file one would open with Protégé.

Requirements

Requirement 1

UCO should store a Protégé catalog-v001.xml file in ontology/uco/master/, enumerating all UCO ontology files.

Requirement 2

The catalog-v001.xml file's hard-coded enumeration must be tested to be in sync. with UCO's imports.

Requirement 3

CASE must provide the same Protégé support as UCO, maintaining its own catalog-v001.xml file in ontology/master/, enumerating all CASE and UCO ontology files. While this might seem out of scope of UCO's purview, this requirement is also to ensure UCO can enable any downstream ontology to provide the same support for Protégé that UCO does.

Risk / Benefit analysis

Benefits

Risks

Competencies demonstrated

Competency 1

A user is interested in using Protégé to load all of UCO's current state in the develop branch, which has some changes implemented since the last UCO release.

Competency Question 1.1

How does the user see the current version of observable:File in the develop branch?

Result 1.1

If there is only one catalog-v001.xml in UCO - uco.ttl would need to be opened with Protégé. Then, observable:File's current state would be viewable through the class navigator.

If instead each ontology directory gets a catalog-v001.xml - observable.ttl would need to be opened. The rest is as above.

Solution suggestion

Coordination

ajnelson-nist commented 2 years ago

PR 450 has been filed to start the implementation for this proposal. Thanks again, @DrSnowbird, for contributing the start of this branch.

Unfortunately, a curious issue arose with trying to load the Collections Ontology shape file. There's a chance it will be resolvable with an extra Git submodule based interaction - I'm out of time to test tonight.

My current feeling is that the testing infrastructure complexity for this might be too high for integration with the UCO 1.0.0 release, but it is a backwards-compatible change that could be integrated with any 1.x.0.

ajnelson-nist commented 1 year ago

PR 450 now has an implemented solution for this Issue. The summary of effects is that with the associated PRs merged, UCO works in Protégé wholly with locally-stored (/-versioned) ontology files, and the generation mechanism is confirmed to work for downstream ontologies. I have tested this with CASE and CASE-Corpora (test links are in PR 450). CASE-Corpora is now able to generate catalog-v001.xml files that cover its ontology import closure.

Some unexpected developments occurred:

The overall increase in risk is for projects that track UCO and CASE as Git submodules for the sake of re-using their virtual environment and/or monolithic ontology build. The virtual environment motion means some tracking projects will need to update paths to scripts. I'm aware that this will impact the CASE Python Utilities' monolithic build tracking and the documentation engines (CASE's, UCO's) most, but I suggest that overall this is logistically acceptable, as paths only need to be updated once per tracking Git project.