opengeospatial / sensorthings

The official web site of the OGC SensorThings API standard specification.
132 stars 28 forks source link

Add JSON-LD Context to STA #137

Open KathiSchleidt opened 2 years ago

KathiSchleidt commented 2 years ago

As mentioned in the charter, investigation of JSON-LD within STA would be interesting. Here a first attempt, I created a context file covering all STA classes and attributes, popped it in the response from a STA providing Thing and ObsProp:

Here you can play with it in the JSON-LD playground: https://json-ld.org/playground/#startTab=tab-expanded&json-ld=https%3A%2F%2Fgist.githubusercontent.com%2FKathiSchleidt%2F6c897acd177109266e498fe7f42d2572%2Fraw%2Fgistfile1.txt&context=%7B%7D

So - what have I done wrong? ;)

hylkevds commented 2 years ago

Cool, nice work.

Does one Context document describe the entire Service, or does one need a different one for each "page"?

Both OData and JSON-LD define the "@context" property. We should check if that causes issues. Probably not, since OData clients clearly identify that they want OData and not LD.

ksonda commented 2 years ago

if Location is gsp:SpatialObject, can location also be gsp:hasGeometry? Basically I'd be wanting an STA-JSONLD to express geometries as geosparql WKT in addition to schema:geo type geometry, similar to what I've implemented in my STA->OAF/JSON-LD: https://locations.newmexicowaterdata.org/collections/Things/items/2238?f=jsonld

This would require changes to how the geometry is expressed of course...

KathiSchleidt commented 2 years ago

Goal is to leave STA the way it is, find a way of expressing this in the context. I admit that the example above is a hack, got tired of talking about it, as issues tend to show up in the doing, I just did (and the issues are nicely following :) )

First, a disclaimer: I created context entries for classes/attributes I could find a semi-sane correspondence (these still need to be checked!), then went freestyle where I couldn't find anything, reason for the dummy sta namespace linked to http://something.org/sta/ If anybody can find concepts for the bits I packed under this ns, please tell me!

@ksonda the geosparql bits in the context are leftover from the (S)ELFIE context I bastardized this from. During (S)ELFIE we were still dealing with JSON-LD V1.0, that had issues with encoding geometry (issue pertained to polygons. In contrast to JSON, LD doesn't include a sequence in arrays as it's modelled on RDF, means when you provided a polygon, you got an unordered set of coordinate pair, good luck finding the polygon ;) This has been resolved in LD V1.1 that allows for arrays to be either lists (ordered) or sets (grab-bag)). When I did the STA hack, I improvised badly:

In the STA implementations I've been using (FROST), the geometries in the location & feature attributes is provided as JSON Geometry, but as the STA data model specifies ANY for these types, you can implement as you please, e.g. geosparql WKT - define a context accordingly.

I'm still waiting for the JSON-FG spec to settle, see where OGC is going with JSON Geometries, then continue figuring. Happy for any and all insights!

KathiSchleidt commented 2 years ago

@ksonda 2nd post on your reworked STA/OAF endpoint (COOL!) Questions:

ksonda commented 2 years ago

Thanks for the context (in both senses) @KathiSchleidt ! I understand the goal of having a context for STA that "just works". I'm coming at this from a slightly different use case, which is what motivated my odd STA->OAF thing. Regarding the geometry issue in particular, the main issue with schema:geo is that schema:geo specs have no way of representing multipolygon, multiline, or multipoint features. This is why in my JSON-LD implementation for pygeoapi/OAF, the JSON-LD actually presents 3 separate geometries:

Addressing your questions from your second comment in order:

  1. GeoJSON-LD does not provide a very useful resource for crawlers importing JSON-LD documents into a knowledge graph if the GeoJSON-LD document is meant to represent a real-world feature that happens to have a geometry, rather than specifically a GeoJSON feature with some attributes that happen to correspond to RDF predicate-object statements. To see what I mean, look how this GeoJSON-LD document gets normalized to RDF triples in the json-ld playground. https://tinyurl.com/ycebhj67. The node with the identifier http://example.com/features/1 gets associated with a blank node, and that blank node is then the subject of GeoJSON property attributes. What knowledge graph harvesters actually want is whatever is the @id to be the direct subject of of any predicates. For the same reason, GeoJSON-LD also makes it very difficult to situate features in more complex ontological patterns, like this (https://github.com/internetofwater/docs.geoconnex.us/wiki/hydrologic-location-json-ld-guidance)

  2. You are correct about the 2 context objects. This is more of a hack to make the pygeoapi implementation of JSON-LD for OAF work than any particularly normative reason. I am working on this as an alternative approach: https://github.com/geopython/pygeoapi/issues/831

  3. In the context I set up, self.Links would be ignored by JSON-LD harvesters entirely. The OAF/JSON-LD endpoint is representing (more like re-presenting) the STA endpoint as linked features provided via the OAF URL pattern. Each OAF url has the JSON-LD version injected into the HTML version of the page, so there is no need to link to the STA URLs directly. Here is the compacted form of the example document: https://tinyurl.com/37xm7f4r

  4. This whole STA->OAF thing I have constructed was motivated to represent an STA endpoint as a crawleable set of linked JSON-LD documents, and in some sense would be obviated by implementing this issue we are talking about here successfully. In any case, the idea was you would point a crawler towards the OAF/Things endpoint. Each Thing/item/id includes the Location geometry information, and links to all relevant OAF/Datastreams/items (which themselves are piping in Sensor, ObservedProperty), which the crawler would then traverse as well. This API is not meant for querying in a practical sense, it is meant to to expose metadata being published by STA to crawlers, so that the STA endpoint can be manifested in some higher-order aggregated knowledge graph. I'm preaching to the choir here but OAF is simply not built for doing complex queries that might concatenate or join entities in a linked-features data model. That said, simplifying the STA data model to this three-entity pattern has been useful for visualization tools we've built that are meant to interact with OAF.

All of this is to say that I was just using OAF as a tool to create more arbitrary/custom JSON-LD out of an STA endpoint, and so has some overlap here but need not be the same thing. One point I'm interested bringing up here though is the possibility of having @id in the json-ld context for STA be set to an arbitrary property, with @iot.id as the default. But would want to support persistent/external/universal identification of nodes. That is, have https://geoconnex.us/nmwdi/st/locations/2238 redirect to https://st2.newmexicowaterdata.org/FROST-Server/v1.1/Things(2238) and to have the geoconnex URI be the @id when the document is rendered in the json-ld playground

KathiSchleidt commented 2 years ago

Collect insights at https://docs.google.com/document/d/1EJzo10eO1m6yQFvQfrB5SZuxTOL-ZFd39O7VKOfKsrk/edit#heading=h.3jf18irghtme

KathiSchleidt commented 9 months ago

New idea from GeoTech - utilize JSON-LD context to explain the various properties blocks. Nice thing is that you can provide the @context as a variable within the properties block, pack in your context

ksonda commented 9 months ago

Do you have an example document of how this would work? I don't think standard json-ld clients could directly use it that way.

ksonda commented 9 months ago

Let me clarify. I think no matter what "@context" has to be at the "top" level of the hierarchy of JSON document. You can accomplish something like the properties block solution you describe via context nesting, for one level of hierarchy only. So, something like this could work:

Consider the STA Thing: https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-331856091114601')

You can add this context up top:

"@context": {
    "schema": "http://schema.org/",
    "name": "schema:name",
    "@iot.selfLink":"@id",
    "properties": "@nest",
     "monitoringLocationUrl": {
       "@id": "schema:url",
        "@nest": "properties"
        }
  }

JSON-LD playground

However, as soon as you do anything where STA returns the response nested in a "value" array, the same context would not work.

For example. if the query was https://labs.waterdata.usgs.gov/sta/v1.1/Things?$top=1 , see the result:

Example JSON-LD playground

ksonda commented 9 months ago

However, supposing you could set a default vocabulary that covers all the attributes/properties you want to use (and have definition server describing all these properties), you could do something like this:

"@context": {
    "@vocab": "https://my-sta.org/vocab/",
    "schema": "http://schema.org/",
    "name": "schema:name",
    "@iot.selfLink":"@id",
    "properties": "@nest",
     "monitoringLocationUrl": {
       "@id": "schema:url",
        "@nest": "properties"
        }
  }

Example JSON-LD Playground

Wild idea is operationally for implementations, not necessarily part of the STA standard: Say FROST has a configurable /vocab endpoint that would serve as a simple definition server. e.g. /vocab/foo1 points to a JSON-LD document with a definition of /foo1, and if applicable, that it is owl:sameAs whatever schema.org or OGC or W3C etc definition server URI.

hylkevds commented 9 months ago

Let me clarify. I think no matter what "@context" has to be at the "top" level of the hierarchy of JSON document.

Reading the spec 4.1 Advanced Context Usage:

In general, contexts may be used any time a map is defined.

I interpret that as that an @context does not have to appear at the top of the document. The example also shows multiple Objects with different contexts. So each properties Object can have its own @context.

ksonda commented 9 months ago

It's true you can put contexts anywhere, but they will only be interpreted if there is context declaring what a URI for the key at least one level above the JSON hierarchy where the desired context is. So you would have to declare that "properties" is something in a context at a level at least one above in the JSON hierarchy for piped-in context at the properties level to work. And given that many responses are actually in a "value" object, I think you'd need at minimum to have STA give a minimal context like this at the same level of any response:

{
  "@context": {
    "value":"@graph",
    "properties": "https://sta.org/properties"
  }
}

For example, consider https://labs.waterdata.usgs.gov/sta/v1.1/Things?$top=2&$select=@iot.id,name,properties/district, where we could pipe in context declaring properties/district to be something

{
  "value": [
    {
      "@iot.id": "AR008-331856091114601",
      "name": "AR008-331856091114601",
      "properties": {
        "@context":{
            "district": {
                "@id":"https://schema.org/areaServed", 
                "@type":"https://schema.org/AdministrativeArea"
            }
         },
        "district": "Arkansas"
      }
    },
    {
      "@iot.id": "AR008-334933091153501",
      "name": "AR008-334933091153501",
      "properties": {
          "@context":{
            "district": {
                "@id":"https://schema.org/areaServed", 
                "@type":"https://schema.org/AdministrativeArea"
            }
         },
        "district": "Arkansas"
      }
    }
  ],
  "@iot.nextLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things?$top=2&$skip=2&$select=%40iot.id,name,properties%2Fdistrict&$orderby=%40iot.id+asc&$skipFilter=%28%40iot.id+gt+%27AR008-334933091153501%27%29"
}

This would not be interpreted. We would need to do this to be interpreted

{
  "@context": {
    "value": "@graph",
    "properties": "https://sta.org/properties"
  },
  "value": [
    {
      "@iot.id": "AR008-331856091114601",
      "name": "AR008-331856091114601",
      "properties": {
        "@context": {
          "district": {
            "@id": "https://schema.org/areaServed",
            "@type": "https://schema.org/AdministrativeArea"
          }
        },
        "district": "Arkansas"
      }
    },
    {
      "@iot.id": "AR008-334933091153501",
      "name": "AR008-334933091153501",
      "properties": {
        "@context": {
          "district": {
            "@id": "https://schema.org/areaServed",
            "@type": "https://schema.org/AdministrativeArea"
          }
        },
        "district": "Arkansas"
      }
    }
  ],
  "@iot.nextLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things?$top=2&$skip=2&$select=%40iot.id,name,properties%2Fdistrict&$orderby=%40iot.id+asc&$skipFilter=%28%40iot.id+gt+%27AR008-334933091153501%27%29"
}

But this IMHO is (a) involves a lot of redundant information being presented and (b) means we may as well put everything in a top-level context anyway.

The other issue is that this would be interpreted like this:

image

I'm not sure if this really serves any particular use case. If the use case is just to describe all the properties, then the raw JSON would provide that information to a user, but not due to anything in particular about JSON-LD or RDF. May as well just pipe in an arbitrary JSON object called "data dictionary".

If the use case is to provide a knowledge graph about each Thing in this case, then (1) "values" should be keyed as "@graph", (2) a graph node identifier for the Thing should be provided in context, either with a redundant URI within the properties block, or we should declare top-level context with nesting, like so:

{
  "@context": {
    "@iot.selfLink": "@id",
    "schema": "http://schema.org/",
    "name": "schema:name",
    "description": "schema:description",
    "properties": "@nest",
    "monitoringLocationUrl":"schema:url",
    "value":"@graph"
  },
  "value": [
    {
      "@iot.selfLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-331856091114601')",
      "@iot.id": "AR008-331856091114601",
      "name": "AR008-331856091114601",
      "description": "Well",
      "properties": {
        "state": "Arkansas",
        "active": true,
        "agency": "Arkansas Natural Resources Commission",
        "county": "Chicot County",
        "country": "US",
        "district": "Arkansas",
        "stateFIPS": "US:05",
        "wellDepth": "80",
        "agencyCode": "AR008",
        "countyFIPS": "US:05:017",
        "countryFIPS": "US",
        "districtCode": "05",
        "altitudeDatum": "North American Vertical Datum of 1988",
        "altitudeMethod": "Interpolated from Digital Elevation Model",
        "hydrologicUnit": "080500020302",
        "altitudeAccuracy": "1.6",
        "monitoringLocationUrl": "https://waterdata.usgs.gov/monitoring-location/331856091114601",
        "monitoringLocationName": "16S01W10CC1 CH-32 WU",
        "monitoringLocationType": "Well",
        "monitoringLocationNumber": "331856091114601",
        "monitoringLocationAltitudeLandSurface": "124"
      },
      "Locations@iot.navigationLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-331856091114601')/Locations",
      "HistoricalLocations@iot.navigationLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-331856091114601')/HistoricalLocations",
      "Datastreams@iot.navigationLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-331856091114601')/Datastreams"
    },
    {
      "@id": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-334933091153501')",
      "@iot.id": "AR008-334933091153501",
      "name": "AR008-334933091153501",
      "description": "Well",
      "properties": {
        "state": "Arkansas",
        "active": true,
        "agency": "Arkansas Natural Resources Commission",
        "county": "Desha County",
        "country": "US",
        "district": "Arkansas",
        "stateFIPS": "US:05",
        "agencyCode": "AR008",
        "countyFIPS": "US:05:041",
        "countryFIPS": "US",
        "districtCode": "05",
        "altitudeDatum": "North American Vertical Datum of 1988",
        "altitudeMethod": "Interpolated from Digital Elevation Model",
        "hydrologicUnit": "080500020104",
        "altitudeAccuracy": "1.6",
        "monitoringLocationUrl": "https://waterdata.usgs.gov/monitoring-location/334933091153501",
        "monitoringLocationName": "10S02W14DC1 DE-7 WU",
        "monitoringLocationType": "Well",
        "monitoringLocationNumber": "334933091153501",
        "monitoringLocationAltitudeLandSurface": "151"
      },
      "Locations@iot.navigationLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-334933091153501')/Locations",
      "HistoricalLocations@iot.navigationLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-334933091153501')/HistoricalLocations",
      "Datastreams@iot.navigationLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things('AR008-334933091153501')/Datastreams"
    }
  ],
  "@iot.nextLink": "https://labs.waterdata.usgs.gov/sta/v1.1/Things?$top=2&$skip=2&$orderby=%40iot.id+asc&$skipFilter=%28%40iot.id+gt+%27AR008-334933091153501%27%29"
}

This would be interpreted like this, which IMO is much more useful

image