wmo-im / wis2box

WIS2 in a box is a reference implementation of a WMO WIS2 Node
https://docs.wis2box.wis.wmo.int
Apache License 2.0
35 stars 15 forks source link

add data pipeline for hydrology data #703

Open tomkralidis opened 4 days ago

tomkralidis commented 4 days ago

Add pipeline(s) to:

Notes:

ksonda commented 4 days ago

Just talked to @dblodgett-usgs We should consider covJSON w/ waterml2 use case elements

tomkralidis commented 4 days ago

Thanks @ksonda. CoverageJSON is default output from pygeoapi EDR support, so we would get it for free once there is an EDR plugin for a relevant backend.

ksonda commented 4 days ago

agree, sounds like the least cost path forward to me...

dblodgett-usgs commented 4 days ago

A couple thoughts about this suggestion.

There are two use cases here -- 1) the "Web data" use case which requires a convention to encode key elements of data for plots and some site metadata and 2) the "data exchange" use case which requires a convention to encode more precise data contents that are unique to the hydrometric station timeseries use cases supported by WaterML2 part 1.

IMHO, it would be best to just use CoverageJSON for the timeseries payload and a GeoJSON-compatible json-schema for the site metadata. If there are critical metadata nuances that can not be captured in a satisfactory way in CoverageJSON, then perhaps we jump into a full json encoding of timeseriesML/WaterML2 Part 1.

I'd be happy to contribute to this effort as it unfolds and really appreciate your efforts on this!!

ksonda commented 4 days ago

Curious if there's room in the EDR spec to cover both use cases, given that /locations is just supposed to be an geojson endpoint of some kind with the schema defined in the open api doc

dblodgett-usgs commented 4 days ago

Probably yes -- for the more complex use case, the WaterML2 Part 1 metadata for time value pair metadata and the ability to alter default per time step metadata is the part that is going to be complicated and EDR has no issue with additional media types from the .../locations end point. Same for .../items, additional media types for features are well supported.

ksonda commented 4 days ago

hmmthat is tricky. Could specify for each parameter a <parameter> _ metadata whose associated range could be a nested array of metadata elements. but i think that breaks other covJSON use cases about slicing and such that assume unidimensional arrays. Also no obvious query mechanism via EDR spec.

Alternative1: best practice specify for each parameter, a <parameter> _ metadata whose range is an array of URI for pointing to some other EDR item which is nested array of "tvp" metadata elements. But then can't select specific metadata elements.

Alternative2: best practice specify for each parameter, a <parameter> metadata <metadata-element> that carries its own range? Clunky, but selectable via parameter query parameter.

dblodgett-usgs commented 4 days ago

I need to go back and read the spec and think about it some. As an initial take, just doing the happy path CoverageJSON with as much of the WaterML2 spec as "just works" would be a really great step!!

ksonda commented 3 days ago

Straightforward:

Seems hacky but does in fact have relevant guidance in the spec:

Unclear:

EDR can maybe handle via /locations, but custom handling by the service is one thing and cross-protocol data exchange is another :(

To force in covJSON options

  1. Ignore
  2. Each Coverage is one time series for one station. each parameter is actually a unique combination of observedproperty and method. add method object as a custom field in paremeter and add station metadata fields as custom fields at the Coverage level. Y think this is closest to the waterml2 xml structure, but I'm not sure if it would break things with vanilla covJSON clients.
  3. Each parameter is a unique combination of station-observedproperty-method, and every station and method metadata element is a custom field in the parameter. Use ParameterGroups liberally to group these parameters by observedproperty.
unep-gwdc commented 3 days ago

That's a quite different approach to what was discussed last week during the HDWG meeting between the colleagues from the WQ IE @sgrellet, @KathiSchleidt, @hylkevds and Rob Atkinson to move towards an update of TSML with a hydro profile/extension and JSON encoding.

I agree with @dblodgett-usgs that we need a fair bit of metadata for the data exchange use case to make it work with WHOS, for station & measurement metadata WIS relies on WIGOS OSCAR/Surface metadata but this is fairly complex XML and hardly implemented by the hydro community so far.

I think this needs a more in-depth discussion within the HDWG and maybe beyond as this is also relevant for other domains

ksonda commented 3 days ago

I think more discussion is good, but as part of that discussion I think it is worth seeing what is reusable or adaptable from straight geojson and covjson rather than assuming a priori we must have an entirely custom new json format from first principles. That may end up being the case, it may not. Above was just getting a start on how covJSON could fit in on the record.

dblodgett-usgs commented 3 days ago

I regret that I was not able to take part in the discussion at the HDWG meeting -- family vacation took precedence.

I fully expect that there is a need to do both. The Web use case could be (kind of has to be) satisfied by geojson and covjson because a boutique format won't be broadly supported / adoptable for Webby use cases.

There may be a world where a JSON encoding of TSML with a WaterML2 Part 1 profile or best practice would be a critical format for data exchange but it would need to be in addition to more accessible Web formats.

There also may be a world where we could establish a convention that would "just work" as geojson and covjson but I have a hard time seeing the compromises necessary for such a convention being acceptable to either Web or data exchange use cases. This is why I make the assertion up front that we probably need to do both at some level.

So, let's run with use of existing accessible formats and focus on Web use cases with as much data exchange content as fits easily?

ksonda commented 3 days ago

Something we've been batting about as an experiment to reveal the opportunities and limitations of the existing constellation of standards for the "webby" use case.

  1. Target a best practice doc for STA that allows a specific STA query to be proxied by EDR in a manner that allows there to be a covJSON output format that delivers the information that the community would want to see in a hydro profile of TSML.

  2. Define a best practice in covJSON for the packaging of station metadata with their time series.

Why?

  1. The webby use case for a time series JSON data packet implies a strong preference for at least a station name or id to go along with time series in the same document
  2. Between the WQIE and Hydroserver2, STA is positioned to gain prominence in the hydro community, while EDR is gaining prominence in the Met community. An STA -> EDR mapping has been under discussion before and this would be a way to move that conversation forward at the same time
  3. EDR gives covJSON