opengeospatial / CoverageJSON

Public repo for CoverageJSON project
Apache License 2.0
9 stars 7 forks source link

Station identifiers in CoverageJSON? #174

Closed dblodgett-usgs closed 3 weeks ago

dblodgett-usgs commented 3 weeks ago

@ksonda and I are brainstorming some approaches for encoding site-based timeseries a la WaterML2 Part 1 Timeseries in CoverageJSON.

Has anyone figured out a convention for encoding site metadata like id, name, owner, etc. in a pointSeries CoverageJSON payload?

petergarnaes commented 3 weeks ago

I don't think metadata like this is intended to be encoded in CoverageJSON, it is only describing the data itself. You could add station id, station name and station owner as parameters, then you have them listed for each position. It's not intended use, but might work for you.

In an EDR API (which returns CoverageJSON for data queries) you would have this metadata on the /locations endpoint, which returns GeoJSON. There are no standard for how to represent stations in GeoJSON though, but at least it's available there.

ksonda commented 3 weeks ago

I think something very minimal, like an id field for a Coverage could be considered, to meet the use case of linking the points in a CoverageCollection of PointSeries to some other fuller metadata about each Point. I'm thinking of being able to be somewhat agnostic to the underlying API in the use of covJSON as a data exchange format.

dblodgett-usgs commented 3 weeks ago

By the logic that "I don't think metadata like this is intended to be encoded in CoverageJSON", the pointSeries type doesn't need X, Y, Z metadata, only the temporal coordinates. A "id" for a location is a fundamental aspect of the coordinates of the geometry and really shouldn't be separated from the data.

As Kyle points out, while this need may not be relevant for some APIs, it's highly relevant for the generic data integration by URL case where a given client may not have knowledge of the URL structure.

Also, one slight correction @petergarnaes -- in EDR, the items end point of an EDR collection for site-based data is intended to use the json-schema linked in section 8.2.7. locations without a locationId does not have a prescribed json scheme and is not really supposed to be a source of features as much as a convenience to get hypermedia about ALL the features. This was a design consideration to make sure we were well aligned with OGC-API Features and could reuse the spatial / parameter filtering capabilities of OAFeatures in EDR.

jonblower commented 3 weeks ago

It's always tricky to set a scope for a data standard. The general guiding principle of CoverageJSON was that it should encode "the stuff you typically need to generate a visualisation", which includes plotting stuff on a map (which is why the x/y/z coordinates are still useful in a PointSeries). We shied away from including other metadata, on the assumption that other standards would cover this (e.g. there have been multiple attempts to create a standard to encode provenance information, and we didn't want to create yet another one).

So personally I had assumed that people would combine CoverageJSON with other standards (perfectly possible in JSON of course) to deal with things like id, name, owner, otherwise the standard would get very large.

CoverageJSON could do something similar to GeoJSON and allow properties to be attached to each object, essentially key-value pairs. This is discussed to some extent in #52. This would give a framework for this kind of thing, although a more detailed standard would be need for full interoperability.

ksonda commented 3 weeks ago

I would support properties with KVP. Whether or not that is judged to be too out of scope, I would still highly recommend allowing a Coverage to have an id. This is analogous to GeoJSON features having id that allow individual features to be persistently identified within a FeatureCollection. This is a very high level thing that doesn't even have to with metadata per se. It seems like a necessary feature for things as basic as being able to write unit tests for software that provide covJSON

jonblower commented 3 weeks ago

Coverages can already have an id: see the spec. Would that meet your needs?

Anyway, I think that allowing properties is an idea that has merit - I don't think there's anything stopping anyone from adding these already, as I think you can always add more properties to JSON objects without breaking the schema (I think), but they wouldn't be currently part of the spec. It could be a fairly simple addition.

dblodgett-usgs commented 3 weeks ago

Focusing in on just an id and perhaps a label (name) would be my preference here. Doing it via a convention layered on top of a generic list of properties would be all good from my perspective, but more of an add on than a core need.

An id at the coverage level does work with some simplifying assumptions otherwise, I think. Is anyone else using id for pointSeries?

dblodgett-usgs commented 3 weeks ago

Thought I had put this thought in earlier... Perhaps following the concept of the NetCDF-CF Discrete Sampling Geometry convention which includes the idea of a station "id" and arbitrary additional attributes along the "station" dimension? So things like name and owner would be ad hoc and the "id" would be a standardized field.

jonblower commented 3 weeks ago

Focusing in on just an id and perhaps a label (name) would be my preference here.

Adding an optional label to a Coverage would be an easy, and backward-compatible, addition, I think.

station "id" and arbitrary additional attributes along the "station" dimension

What is the equivalent of the station dimension in CoverageJSON? Each Coverage would have the data from only one station, I think. Maybe I've misunderstood what you mean!

dblodgett-usgs commented 3 weeks ago

In cases where you have a coverage of irregularly-located points, the points would typically have both an id and x/y coordinates, (think weather stations), it wouldn't make sense to support a multi-valued station dimension rather than restricting things to one and only one station per coverage.

I very much appreciate the design decision to avoid the complexity of discrete coverages like this and don't really think coverageJSON would want to go there unless there were strong use cases for it -- which I don't really see.

But to be conceptually consistent with related data models, you would expect a station id in the axes of a pointSeries or other discrete coverage in addition to the station coordinates.

jonblower commented 3 weeks ago

Ah yes, understood, thanks!

I guess you could do something like this, to record timeseries data from a series of stations at different locations (which all record data at the same times):

{
  "type" : "Coverage",
  "domain" : {
    "type": "Domain",
    "axes": {
      "t": { "values": ["2008-01-01T04:00:00Z", "2008-01-01T05:00:00Z"] },
      "composite": {
        "dataType": "tuple",
        "coordinates": ["id","x","y","z"],
        "values": [
          ["station1", 1, 20, 1],
          ["station2", 2, 21, 3],
          ["station3", 2, 20, 4]
          ...

This is actually perfectly valid CoverageJSON, although its real-world interoperability will be limited if clients don't know how to handle that type of domain. You could add "domainType": "MultiPointSeriesWithId", if this domain type were defined somewhere, so that clients know what to expect.

The disadvantage with the above is that there isn't a sensible way to add further metadata to each station, unless station1 is a pointer to another document somewhere. Alternatively, you could allow a properties field that is an array of objects, each of which corresponds to a station, in the same order as the values in the composite axes, e.g.:

{
  "type": "Domain",
  "domainType": "MultiPointSeries",
  "axes": {
    "t": { "values": ["2008-01-01T04:00:00Z", "2008-01-01T05:00:00Z"] },
    "composite": {
      "dataType": "tuple",
      "coordinates": ["x","y","z"],
      "values": [
        [1, 20, 1],
        [2, 21, 3]
      ]
    }
  },
  "properties": [
     {
      "id": "station1",
      "label": "My favourite station"
    }, {
      "id": "station2",
      "label": "My second favourite station"
    }
  ]
}
dblodgett-usgs commented 3 weeks ago

This is really good to see as a potential @jonblower -- thanks for taking the time to mock it up. "Real world interoperability" is king here and I would imagine this is not really a use case that would catch on, but this will hopefully answer people's questions when and if they come back to this question in the future. Thanks!!