opengeospatial / netcdf-ld

Encoding standard to enable RDF graphs to be encoded in and interpreted from netCDF files
http://www.github.com/opengeospatial/netCDF-Classic-LD
Other
8 stars 5 forks source link

Adding a predicate for the location of the NetCDF file #34

Open adamml opened 4 years ago

adamml commented 4 years ago

Rationale

In discussions on the group telecon, it came to light that the Binary Array LD (BALD) specification would describe the contents of the NetCDF file for NetCDF-LD, but not provide a link to the NetCDF file itself. It became apparent that an extra predicate would be needed in the RDF representation of a Binary Array file in order to support this.

The file location should be an optional, user-specified parameter supplied at runtime.

Approach

A number of options have been considered:

Due to the stability and maturity of the vocabularies, it was decided to focus on the Schema.org or DCAT options.

A further consideration was the grouping of NetCDF files into collections, which may be acheived in either Schema.org or DCAT if the contents of the NetCDF file are considered to be a Dataset and the collection of the NetCDF files a DataCatalog. The ability to nest, or to create heirarchies of catalogues was also considered, such as a collection of NetCDF files being available with other files or collections through a THREDDS server. While we do not provide an implementation pathway for this, the consideration motivated us to focus on DCAT which at the time of writing supports nesting catalogues, whereas Schema.org does not.

Boilerplate code

First an addition to the BALD ontology will be required:

@prefix dcat: <http://www.w3.org/ns/dcat#>.

bald:Container a dcat:Dataset. 

Then the following boilerplate would allow a software agent to traverse the graph to find the file to download the NetCDF data from:

@base <http://foo.bar/my-netcdf-file.nc>.

@prefix bald: <https://www.opengis.net/def/binary-array-ld/>.
@prefix dcat: <http://www.w3.org/ns/dcat#>.
@prefix dct: <http://purl.org/dc/terms/>.

<./> a bald:Container;
    dcat:distribution [
        a dcat:Distribution;
        dcat:downloadURL <>;
        dcat:mediaType [
            a dct:MediaType;
            dct:identifier "application/x-netcdf"
        ];
        dct:format [
            a dct:MediaType;
            dct:identifier <http://vocab.nerc.ac.uk/collection/M01/current/NC/>
        ]
    ].

Graph of the abover TTL

Further Considerations

Questions

@jyucsiro, @marqh a couple of questions/topics for discussion:

  1. Does this look like the approach we discussed on the call?
  2. I think there may be a subtlty I am missing in the way @base is parsed, at least one library I have used ignored the filename beyond the final slash when converting to RDF/XML. We may want to have a discussion about using the full URI if we take this to production.
  3. Are we ok with the introduction of blank nodes here?
  4. Is there a better URI which defines NetCDF than the oine I have used here?
  5. The MIME type I used is not actually registered with IANA, and also there is a suggestion that THREDDS also has a different MIME type for NetCDF 3 and NetCDF 4. Can we handle this?
marqh commented 4 years ago

@adamml @jyucsiro

Rob has raised a query on the PR looking to update the vocabulary with respect to https://github.com/opengeospatial/NamingAuthority/pull/39

Should

@prefix dcat: <http://www.w3.org/ns/dcat#>.

bald:Container a dcat:Dataset. 

be

@prefix dcat: <http://www.w3.org/ns/dcat#>.

bald:Container rdfs:subClassOf dcat:Dataset. 

?

Please may you consider this question?

thank you mark

marqh commented 4 years ago

fwiw, i think that the use of rdfs:subClassOf is valid here

adamml commented 4 years ago

@marqh I've been trying to find headspace to think about this between meetings. I'd agree with you that it's valid here, yes.

marqh commented 4 years ago

many thanks @adamml

I have updated the request for change with the OGC NA

marqh commented 4 years ago

The update to the BALD vocabulary has now been adopted

https://github.com/opengeospatial/NamingAuthority/pull/39

A bald:Container instance is now also a dcat:Dataset

https://www.opengis.net/def/binary-array-ld

marqh commented 4 years ago

is there a more definitive definition of a netCDF file than

dct:format [
    a dct:MediaType;
    dct:identifier <http://vocab.nerc.ac.uk/collection/M01/current/NC/>
    ]
adamml commented 4 years ago

We should consider adding this infpormation to the Schema.org representation of BALD as well, e.g.;

{
   "@context": "https://schema.org/",
   "@type": "Dataset",
   "distribution": {
     "@type": "DataDownload",
     "contentUrl": "http://",
     "encodingFormat": [
       "application/x-netcdf",
       "http://vocab.nerc.ac.uk/collection/M01/current/NC/"
     ]
   }
}
simonoakesepimorphics commented 3 years ago

Sorry if this discussion is already closed, I can raise a new issue instead if appropriate. The containment part of section 6 states that groups can be "contained by" files, which I interpret as:

<file.nc> a bald:Container ;
    bald:contains <file.nc/> .

Or, "the root group is contained by the file". Is this interpretation valid / should the wording of that section be changed to reflect the intentions discussed above?