iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
31 stars 18 forks source link

452 document odis metadata graph fundamentals #453

Open pbuttigieg opened 4 months ago

pbuttigieg commented 4 months ago

Closes #452

Describing the ODIS graph at a conceptual level to promote homogeneous alignment.

smrgeoinfo commented 4 months ago

@pbuttigieg looks like the metadata about metadata section is not complete. Based on our discussions and the various specs, as well as the github issues (1,2 ), here's the draft text I'm proposing for CDIF:


In a harvesting/federated catalog system some metadata about the metadata is important to keep track of where metadata came from, what format/profile it uses (harvesters need this to process), and update dates see Metadata Content Requirements. Unambiguous expression of this information requires making statements about a metadata record distinct from the thing in the world that the metadata describes (See Github issues 1,2 ). In an RDF framework, this requires a distinct identifier for the metadata record object that will serve as the subject for these triples.

Schema.org includes several properties that can be used to embed information about the metadata record in the resource metadata: sdDatePublished, sdLicense, sdPublisher, but lacks a way to provide an identifier for the metadata record distinct from the resource it describes, to specify other agents responsible for the metadata except the publisher, or to assert specification or profile conformance for the metadata record itself.

In the RDF serialization, Schema.org metadata records are JSON-LD node objects, and include an "@id" keyword with a value that identifies the node. This identifier can be interpreted to represent a thing in the world that the metadata record (the 'node') is about, or to represent the metadata record (a JSON object) itself. Here is a short example record (other '@' properties are explained below):

{   "@context": "https://schema.org",
    "@id": "ex:URIforResource",
    "name": "unique title for the resource",
    "description": "Description of the resource",
    "dateModified": "2017-05-23"
}

When this JSON-LD is converted to RDF triples (e.g. using the JSON-LD playground ), this results:

<ex:URIforResource> <http://schema.org/description> "Description of the resource" .
<ex:URIforResource> <http://schema.org/name> "unique title for the resource" .
<ex:URIforResource> <http://schema.org/dateModified> "2017-05-23"^^<http://schema.org/Date> .

The interpretation of the first two sets of triples would be that they are statements about the thing in the world that the metadata record is about. The third triple is ambiguous-- was the metadata content modified, or the described resource in the world? There does not seem to be any recognized best practice or consensus for dealing with this issue, so CDIF defines these conventions.

Use the schema.org identifier property to identify a thing in the world that is the subject of the JSON-LD node. The identified thing might be physical, imaginary, abstract, or a digital object. The JSON-LD @id property identifies a node in a graph, and can be interpreted in different ways; as a URI it is expected to dereference to produce the same JSON-LD object in which it is defined. Given this convention, when the metadata record is processed, the processor should use the schema:identifier as subject of triples about the subject of the metadata record to avoid ambiguity. In addition, this convention would suggest that if a schema:identifier property is present, the @id property should be interpreted to identify the JSON object that is the representation of the node in the knowledge graph.

Statements about the metadata record as a distinct entity should be made using a separate identified node object. This node object can be embedded in the metadata record about the resource in the world (Example 1 below), or published as a separate node (Example 2 below).

{   "@context": [
        "https://schema.org",
        {"dcterms": "http://purl.org/dc/terms/",
         "ex":"https://example.com/99152/"
        }
    ],
    "@id": "ex:URIforNode1",
    "@type": "appropriate schema.org type",
    "identifier":"ex:URIforDescribedResource",
    "name": "unique title for the resource",
    "description": "Description of the resource",
    "subjectOf": {
        "@id": "ex:URIforNode2",
        "@type": "DigitalDocument",
        "dateModified": "2017-05-23",
        "identifier":"ex:URIforNode1",
        "description":"metadata about documentation for ex:URIforDescribedResource",
        "dcterms:conformsTo": {"@id":"ex:cdif-metadataSpec"}
    }        
   }

Example 1. Metadata about the metadata embedded.

{
    "@context": [
        "https://schema.org",
        {"ex": "https://example.com/99152/"}
    ],
    "@graph": [
        {
            "@id": "ex:URIforNode1",
            "@type": "Dataset",
            "identifier": "ex:URIforDescribedResource",
            "name": "unique title for the resource",
            "description": "Description of the resource"
        },
        {
            "@id": "ex:URIforNode2",
            "@type": "DigitalDocument",
            "dateModified": "2017-05-23",
            "identifier": "ex:URIforNode1",
            "description": "metadata about documentation for ex:URIforDescribedResource",
            "dcterms:conformsTo": {"@id": "ex:cdif-metadataSpec"}
        }
    ]
}

Example 2. Metadata about metadata as a separate graph node.

Including the schema:description with the string "metadata about documentation for ex:URIforDescribedResource" will allow disambiguating different usages of the subjectOf property. The ex namespace in the example above is only included so the example is valid; actual metadata would likely have its own namespace for resource and metadata URIs. The distinct identifier for the metadata record (ex:URIforNode1) allows statements to be made about the metadata separately from statements about the resource it describes.

pbuttigieg commented 3 months ago

xref https://github.com/iodepo/odis-arch/issues/102

pbuttigieg commented 3 months ago

Thanks @smrgeoinfo - I'll think about the content and iterate on the CDIF repo