Open ghobona opened 2 years ago
Irrespective of RDF/SPARQL, the presence of unambiguous identifiers for data elements using basic JSON-LD allows clients to actually join on common identifiers in two different datasets. I think this really minimal enabler is more important than supporting RDF as a logical model - JSON-LD allows JSON to be joined automatically - without it its guesswork or out-of-band magic configurations required to specify joins.
I hadn't heard about OGC API Joins before and don't know how it is envisioned to work. It's hard to speculate about Linked Data/Semantic web support without this...
I can say that from Dutch experiments we concluded that it's hard to combine data from different APIs when it's not clear what the common identifiers are. At the least a common way to express identifiers is needed (some standardized URI pattern). But probably more is needed to actually get to the data (especially if it's in another API).
This API seems to be meant for the ad hoc creation of flat tables from multiple sources, potentially heavy on redundancy. This paradigm is at odds with the semantic web, that instead strives for a linked and federated data paradigm.
However, there are some benefits from adopting semantic web best practices in a this API, or any other API for data provision. URIs as unique feature and attribute identifiers, the application of well established ontologies such as GeoSPARQL, SOSA, SKOS or QUTD. I wonder if it wouldn't be more efficient to simply lay out a standard for spatial data provision with OGC services instead of figuring it out specifically for each API. GeoSPARQL and the DCAT specialisation for the spatial datasets would already go a long way.
Also keep in mind the work currently developed around the Features API with Prez and the ogcldapi profile.
@lvdbrink note that adding a JSON context via a link header it is possible to map JSON payloads to URIs for the fields - to unambiguously specify whether two data values are in fact the same identifier in different contexts.
the same mapping of a namespace onto identifier tokens might be possible - yet to bottom this out
the ability to use a URI as an identifier in the actual data is easier to interpret, harder to achieve - but theoretically many more users need to interpret than data providers deliver, so it would appear to be a reasonable investment - e.g. from a FAIR perspective URI identifiers is best.
At the least a common way to express identifiers is needed (some standardized URI pattern). Quoting myself in order to correct - it's not so much a standardized URI pattern that's needed. But some way to recognize that something is an identifier and where/how to get the information resource that provides information on the thing being identified. Indeed, URI identifiers are the best way to do that on the web.
It's important that the identifiers are unique and persistent.
But that's not a concern for the API, rather something the data source must accomodate.
At a minimum, it could be nice to have a mechanims to define the meaning of its field in the table by pointing to a URI that defines the content of each column.
Having now had considerable experience mapping schemas and semantics using JSON-LD, its certainly a viable option that uses available standards, however two things make this complicated enough to require standardisation of mechanisms.
Firstly, the json-ld context needs to reflect the schema structure, in practice this means tools to bundle contexts using schema fragment mappings. It's too hard to do this manually and tools aren't good enough to help debug.
Secondly many structural elements have their own semantics, and if these don't exist in the target ontology then an additional ontology needs to be defined. We can define transformations once we get rdf from the json-ld, but can't always map directly to the intended semantics. E.g. no mapping from geojson geometry to Geosparql equivalents.
The OGC building blocks handle all these concerns in a flexible way, and a library of mappings for OGC API components is being developed
2024-09-24 SWG discussed this and agreed that this could be an enhancement in a future version of the standard.
Considering that many users of OGC API - Joins are likely to want to associate the feature collections with data from third-parties, it might be worth looking into supporting some Linked Data approaches. For example:
Any thoughts are welcome. Let's aim to close the GitHub Issue at the February 2023 OGC Member Meeting.