Concept mapping to EDR - Githubissues

ksonda commented 2 years ago

This may be a can of worms.

Over at https://github.com/geopython/pygeoapi/issues/817#issue-1058624908 there is discussion of implementing an STA provider for an EDR endpoint. This of course raises the question of how to map STA queries to EDR queries. I think this discussion is better had here for possible general application and broader thoughts from STA community than silo'd within pygeoapi.

@hylkevds @KoalaGeo @jkreft-usgs

KoalaGeo commented 2 years ago

@chris-little will probably have some ideas

KoalaGeo commented 2 years ago

Making a start - ammended table from the STA docs...

SensorThings API Entities	O&M 2.0 Concepts	OGCAPI-EDR
Thing (and Locations, HistoricalLocations)	-	Item
Datastream	-	Instance?
Sensor	Procedure	Item
Observation	Observation	Item
ObservedProperty	Observed Property	x
FeatureOfInterest	Feature-Of-Interest	x

ksonda commented 2 years ago

This gets at the thorny issue of what an EDR "collection" means in sensor data context. I could see collections might be mapped to a Datastreams, Thing/Locations, or even ObservedProperties(x)/Datastreams

This is why I think in concept mapping I think we need to think in parallel about the query mapping.

EDR Query	STA Queries required to get equivalent information
/	/{version} (info not provided by STA)
/conformance	/{version} (serverSettings: conformance)
/collections	/{version} (or some transformation depending on what we think STA collections are)
(e.g.) /collections/monitoring-locations/items	/Things?$expand=Locations
(e.g. /collections/{observedProperty}/items	/ObservedProperty(x)/Datastreams

hylkevds commented 2 years ago

I think ObservedProperty can be mapped to the EDR parameter parameter-name. OMS 3 (O&M 3) introduces the concept of ObservationCollection, that maps to the STA Datastream. Thing+Location+HistoricalLocation usually (but not always!) maps to the O&M Platform.

The hard part is indeed defining what a "collection" is. When applying Features I would normally map each EntityType to a collection, but clearly that doesn't work for EDR.

Lets build a query:

An EDR query generally returns Observations, so the STA query would be on v1.1/Observations.
Since we also need the geolocation of the Observations, we'd also need an $expand=FeatureOfInterest.
If we indeed map parameter-name to ObservedProperty, we can use that in the filter: $filter=Datastream/ObservedProperty/name eq '[parameter-name]'
The rest of the EDR query can be added to the filter too. I don't recall any EDR query features that can't be translated to STA.

For many STA servers this would work find, since generally services don't mix observations from different domains. In this case there is only 1 collection for the entire service. Of course for servers where there are very different uses for the same ObservedProperty, like indoor and outdoor temperatures, this would not work, and additional filtering is needed. This additional filtering would need to come from the collection.

For example:

Temperature is measured inside and outside.
Outside measurements are made by environmental monitoring stations, modelled as Things that have property/type with value 'station'
Indoor measurements are connected to things of Type 'room'.

We can then make two collections, that add the filter parts:

Collection 'outdoor' with filter: and Datastream/Thing/properties/type eq 'station'
Collection 'outdoor' with filter: and Datastream/Thing/properties/type eq 'room'

Of course one could also make an EDR endpoint that returns Things instead of Observations...

dblodgett-usgs commented 2 years ago

This is a productive conversation -- thanks for opening it @ksonda -- a couple things to help steer the analysis as I don't think much of the above is taking in the full scope of what EDR specified. Jumping to the end, the EDR location end point and how it relates to the items endpoint is the key here that hasn't been discussed yet.

First, in regards to the idea of a mapping in the first place.
A collection is essentially an arbitrary "collection of geospatial data". There is a lot of leeway for an API-developer to decide how to interpret the EDR query patterns for the back-end dataset. Given this, the idea of a mapping from STA to EDR is going to run up against implementation patterns that break the mapping and we really need to take it on an STA-implementation to EDR-implementation basis.

Second, in regards to the nature of EDR implementations.
The majority of EDR implementations are going to provide access to data that are relatively continuous in space and time with no a-priori identified features that one might query by identifier. Rather, the typical pattern will be point/area/trajectory queries against some data-cube on the back end. So forget items for most implementations of EDR (for now). In this case, I guess I don't see a strong use case for mapping EDR to STA at all unless you wanted to set up some pre-determined EDR queries that could be represented as things (these could be sensor things or virtual sensor things). In which case, read on.

For STA implementations that choose to use the "items" endpoint, we have something to work with. But what we have is actually just a collection that has some set of items that can be accessed as OGC-API Features. These items have a specific set of attributes that map onto the valid EDR query patterns for each item -- e.g. valid parameter-names, time range, etc. see here for more on this pattern. Note that these items need not be sensors -- they could also be pre-determined EDR queries against any backing data. e.g. pre-defined useful point, area, or trajectory queries that people may want to discover as items.

I could go on, but will leave it there and add my two cents on where there are some firm mappings.

An EDR location is key -- the location endpoint is how you access observations for discrete locations, period. items just gives you metadata about what you might access as a location or via another EDR query pattern. An STA Thing is an item -- that is actually just saying that a thing is a feature with some properties that describe the kind of timeseries data you can access for it at the location end point.

Apologies for not being more specific on STA details -- I have to admit that I still am just kind of confused by the API pattern and how it maps to backing data. Hopefully the description of the intention for EDR implementation patterns above is useful.

ksonda commented 2 years ago

I agree that STA implementation patterns are infinite which means we're not going to come up with a mapping that could be considered "standard". For use cases that STA is particularly suited for, like moving platforms or platforms that change their feature of interest over time and space, and extremely complex OData queries with filters and selects of attributes of multiple entities, etc. then EDR may just not be the correct interface for that kind of query. In this category I would put maybe many of the "Smart Cities" implementations that track transportation infrastructure and vehicles.

In practice though we have a relatively small community in the environmental monitoring space, where we are dealing with stationary monitoring locations with discrete, stable features of interest. The motivating use cases are essentially:

Discover monitoring location metadata, filtering on parameters, period of record, and space
Get observations based on parameters, time, and space.

Sure, for providers of such STA endpoints, could we simply ask them to publish netCDF/zarr versions of the underlying data and publish EDR based on that? But I think there is desire among STA providers to move directly from STA to EDR, so that EDR can be supported for simple RESTful queries and STA for those users who need OData without needing to construct and maintain a separate back end for each of them. For these implementations I think we have some traction between the monitoring network mockup from @dblodgett-usgs and what @hylkevds put together from above.

In practice, I think the major sources of variation in STA implementations of stationary environmental sensor networks have been

How to distribute "station" metadata over Things, Locations, and Sensors),
whether to model Sensors as generalized procedures or particular, discrete sensors with unique identifiers.
whether to model Things as having multiple Sensors or just assigning all Datastreams at the Thing to a single Sensor
whether FeaturesOfInterest are bothered with explicitly at all, or just auto-generated to be the Location of the Thing the Sensor is associated with at the time the Observation was made

I think 1. and 2. and 3. can be handled across STA implementations (in the stationary sensor case) by just concatenating all the information into a ThingLocationSensorDatastreamObservedProperty feature set that could be its own collection with items.

I think 4. is both trickier and more important. Maybe there is some room for flexibility in whether the EDR location always be the STA FeatureOfInterest or STA Location. Implementations of an STA->EDR server maybe need to be configurable as to which one.

dblodgett-usgs commented 2 years ago

👍 from me -- this is spot on.

Getting at your:

"whether FeaturesOfInterest are bothered with explicitly at all"...

If we rewind the clock a bit to the HDWG best practice for SOS2 and WaterML2-Timeseries... For the sake of interoperability, the "feature of interest" was restricted to be the sensor's location. That was not intended to limit association to other features of interest, but the interpretation of "what feature does this observation characterize" is a bit too open ended for interoperability purposes -- especially when we start to think about the schema used to describe a feature of interest.

I think a similar approach would need to be taken here -- and clearly is the approach that some STA implementers are taking.

EDR doesn't attempt to take on the issue of "domain" feature of interest. Rather, it focuses on the sampling feature which may or may not be an a-priori identified (EDR) location and by extension discoverable at the items end point of the collection. This lends its self well (is identical) to the pattern of FeatureOfInterest being the Location of the Thing/Sensor when the `Observation was made.

ksonda commented 2 years ago

This suggests that to cover bases within the intention of EDR, EDR location should be STA Location. I know @jkreft-usgs as well as myself with some water utility work and the cross agency work in NM may be taking on FeatureOfInterest explicitly, which would complicate the STA query necessary for the general "Observations" collection of an STA-backed EDR.

chris-little commented 2 years ago

@ksonda And of course we put location and Item into EDR for strong compatibility with OGC API-Features. Perhaps adding an API-Features column on your table above may be helpful. As below.

Making a start - amended table from the STA docs...

SensorThings API Entities	O&M 2.0 Concepts	OGCAPI-EDR	OGCAPI-Features
Thing (and Locations, HistoricalLocations)	-	Item	Item
Datastream	-	Instance?	?
Sensor	Procedure	Item	?
Observation	Observation	Item	?
ObservedProperty	Observed Property	x	?
FeatureOfInterest	Feature-Of-Interest	x	?

ksonda commented 2 years ago

Yes, and I think we need to concatenate many of the STA entities into some EDR collection to enable the kind of combined space + parameter query that EDR is fundamentally about. I think location EDR at a minimum needs to include parameter-names right? Essentially I want the EDR endpoint /edr/collections/{collectionId}/locations/{locationId} to include information about the observedproperty and units.

Dealing with a similar problem with OAF, me and @webb-ben put together an ad-hoc STA->OAF mapping for pygeoapi that @KoalaGeo was generous enough to try out

That mapping roughly looks like this

OGCAPI-Features Collections	SensorThings API Queries
/collections/{Things collection name}/items	`/Things?expand=Locations,Datastreams($select=@iot.id, properties)`
/collections/{Datastreams collection name}/items	`/Datastreams?$expand=Sensor,ObservedProperty,Thing($select=@iot.id),Thing/Locations`
/collections/{Observations collection name}/items	`/Observations$expand=FeatureOfInterest` or `/Observations?$expand=Datastream($select=@iot.id)/Thing($select=@iot.id)/Locations($select=@iot.id,location)`

Maybe all of the above is way too opinionated though, and my ambitions for even that level of consensus is not feasible

dblodgett-usgs commented 2 years ago

We are making progress here. A few important points to surface.

1) STA applies to (moving) sensor use cases where EDR is a focused on accessing (virtual) sampling data. This difference seems to be the source of complexity here. 2) In EDR "locations" (and their "items" representation) are really intended to be static identified locations where data can be retrieved. This is a clear intersection with STA where the "Thing" is a monitoring location. [1] 3) The EDR GeoJSON schema seems to be a key here. [2] 5) Alignment of STA with collections is arbitrary, but for the sake of interoperability we should consider supporting an STA endpoint to be a valid view of a collection.

So we might have: /collections/{collectionid}/ -> collection metadata collections/collectionid}/items -> a geojson point feature collection that adheres to the EDR GeoJSON schema [2] -- note that the items end point works as an OAFeatures end point. /collections/{collectionid/locations/ -> Same as .../items but no OAFeat functionality. /collections/{collectionid}/locations/{locationid} -> access timeseries data from the location with parameter and temporal filtering per EDR.

Note that the .../items end point is the location as a feature this maps onto the STA implementation pattern where every observation location is the same as the feature of interest. Any additional feature of interest goes beyond the scope of EDR but would not be precluded through some links in the response from .../items and .../locations queries.

Taking another stab at this table...

SensorThings API Entities	O&M 2.0 Concepts	OGCAPI-EDR	OGCAPI-Features
Thing (and Locations, HistoricalLocations)	-	item or location	Item
Datastream	- [3]	parameter-names	?
Sensor	Procedure	parameter-names	?
Observation	Observation	item or location	?
ObservedProperty	Observed Property	parameter-names	?
FeatureOfInterest	Feature-Of-Interest	item or location	?

[1] Note that the more typical EDR use case (a data cube that we want to sample with some sampling geometry) is quite different with respect to STA -- potentially different enough that it isn't worth pursuing a mapping between the two.

[2] The EDR GeoJSON schema contains required parameters for a geojson document:

datetime - the date range for which data are available.
parameter-name - the parameter names that can be queried for this location
label - a name to put on a hyperlink
edrqueryendpoint - A url where timeseries data can be accessed for this location.

[3] Note that EDR "instances" are intended to be instances of a collection. The real intention here is for collections that have versions (like forecast models that have runs or ensembles). Use of instances for Datastreams would be a very different implementation pattern where a collection is a Thing with one item and one location. I don't think this is worth pursuing.

ksonda commented 2 years ago

Thanks, this is helpful. This combined with these docs in particular are helping think through this better.

In the stationary sensor/discrete sample location use case I'm imagining (@KoalaGeo and @jkreft-usgs chime in if I'm off base), /instances is irrelevant, and the entire STA endpoint would be a single /collection/{collectionId}. Multiple collections could be delineated by some property of any STA entity (or combination thereof) if desired, but this would complicate implementation by necessitating an appropriate STA $filter for any queries generating the information being proxied into EDR. From a best practices perspective this could be written down somehow, but from a software development perspective (e.g. an STA provider for pygeoapi EDR), I am skeptical anything other than "an STA endpoint is one EDR collection" can be implemented in a way that applies across many STA providers.

As for STA entities, HistoricalLocations are irrelevant, and FeaturesOfInterest may or may not be explicit entities like a stream segment, while Locations are always the location of the Things and Sensors

The EDR Capabilities endpoints /, /groups, and /collections endpoints would need to specified in a custom way, as much of that information is not included anywhere in an STA endpoint. /conformance could be populated with a transformation of an /sta-server/v1.1 serverSettings/conformance node.

Now, the other endpoints get more complicated. The EDR schema for the responses for requests for a given metadata endpoint requires information from multiple STA entities, such that it doesn't make much sense to me to map for example an STA Thing to an EDR item. Rather, an EDR item corresponds to a document that would need to be cobbled together from an STA Thing, its Location, and its Datastreams and linked ObservedProperty. And possibly also Sensor.

EDR Collection metadata endpoints

`/edr/collections/{collectionId}/items` and `/edr/collections/{collectionId}/locations`

Return a document of "sampling features".

In parentheses next to each element of the EDR response schema I propose the STA query and specific element that the information would need to come from

FeatureCollection/GeoJSON with
- type (/Locations -- location/type)
- datetime or interval (/Locations?$expand=Things/Datastreams -- phenomenonTime)
- parameter-name (/Locations?$expand=Things/Datastreams($expand=ObservedProperty) -- name)
parameters [list with members each with properties...] (/ObservedProperties)
- id (/ObservedProperties(x) -- iot.id)
- type (/ObservedProperties(x)/Datastreams -- observationType)
- description (/ObservedProperties(x) -- description)
- label (/ObservedProperties(x) -- name)
- data-type (ObservedProperties(x)/Datastreams -- observationType)
- unit (name, label, symbol) (ObservedProperties(x)/Datastreams -- unitOfMeasurement)
- observedProperty (ObservedProperties(x) -- name)
- extent (temporal, spatial, other) (ObservedProperties(x)/Datastreams -- phenomenonTime and observedArea)

This type of thing would also need to be done for the data query endpoints themselves, but it is late and I will return to this later -- I do agree with @hylkevds mostly on this, with the possible exception of using Locations (a more complicated STA query) instead of FeatureOfInterest to accommodate explicit FoI.

chris-little commented 2 years ago

@ksonda A minor point, but what did you intend by group? EDR Capabilities endpoints /, /groups and /collections endpoints

ksonda commented 2 years ago

@chris-little the short answer is I don't know. It's in both the ReDoc and swagger docs under Capabilities endpoints but not in the standard (at least according to a ctrl + f).

https://developer.ogc.org/api/edr/edr_api.html

chris-little commented 2 years ago

@mburgoyne Any suggestions about the /group endpoint?

dblodgett-usgs commented 2 years ago

Looks like it's coming from this? https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/blob/master/standard/openapi/schemas/groups.yaml - I opened https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/327 to track this.

KoalaGeo commented 2 years ago

@ksonda at the moment, agree instances would be irrelevant for straight sensor data.

However I have talked with @hylkevds previously about adding our groundwater forecast ensembles to our STA endpoint which occurs already do - "

Observation/phenomenonTime: The time in the future for when the given water level is expected. Either a time instant or interval.
Observation/resultTime: The time the forecast was made (i.e. now())
Observation/validTime: The time interval where this forecast is the best fitting one. Usually from now, till when the next forecast is expected to be made
Sensor: The forecasting algorithm.

If the Borehole is your Thing, you can make a new Datastream for the forecast data.

Someone who wants the current forecast can request Observations and filter on "validTime not lt now()" or with FROST: "overlaps(validTime, now())". And for research purposes you can still access all old forecasts, and group them by resultTime to make separate plots."

For this use case, then instances would need to be included...

chris-little commented 2 years ago

@KoalaGeo @ksonda Another common use case for instances would be for test/development/production data, for example.

ksonda commented 2 years ago

I retract the irrelevance of instances :)

However, it does present similar problems as collections, in that the STA provider has wide latitude in defining what an instance is, the indicator for which could be instantiated as a name, description, or arbitrary property attribute of Things, Sensors, Datastreams, or even just Observations (as a parameter). Is there any best practices guidance within STA for how to deal with an idea similar to instance. If not, should there be?

chris-little commented 2 years ago

@ksonda Any suggestions about the /group endpoint? This has been identified as an historical relic, when we thought there may be groups of collections in OGC API-Common and other standards, which were moving goalposts at the time, and some still are!

A PR is being prepared to remove it, in V1.0.1 of the standard, which is the Master branch now.

Paleo-ware, software archaeology?

ksonda commented 2 years ago

If /group is not actually part of the standard then I don't think there's anything to align about it!

hylkevds commented 2 years ago

How the EDR concept of /instances is encoded in a STA endpoint depends on the use-case. Most use cases won't have instances at all, but only an ever growing set of Observations. The use case that I can think of that may have instances is the one with distinct prediction runs, where each run may be seen as an instance. And even there the details may differ: Is there a new Sensor (and Datastreams) for each run, or are Observations of different runs mixed into one (set of) Datastreams?

dblodgett-usgs commented 2 years ago

Spot on @hylkevds -- You've illustrated the perfect example of why there are very few standards that attempt to handle "versions" of a dataset where there is any semantic structure to what the versions are.

The fact is, STA does not have a meta-dimension to say "this set is the same in all ways from the containing set except in this one way that makes it a useful version of the same thing that we want to track uniquely."

In EDR, "instance" is really intended to support use cases where the same model has been run multiple times and produced slightly different results. But as you all have pointed out above, there are other use cases where this kind of grouping may be useful.

IMHO, we would do ourselves a favor to avoid instances in this conversation except in the case that we have multiple STA endpoints that are identical to each other except in some specific difference, like software version, model initialization conditions, etc. In that case, we would call the two or more STA endpoints that are essentially identical a separate instance in EDR.

KoalaGeo commented 2 years ago

Connected discussion started on https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/373

ksonda commented 2 years ago

IMO if we can use the water quality IE to define what the semantics are for this kind of observation data, we may make progress on this as both EDR and STA concepts could be mapped more clearly to O&M/SSN/SOSA semantics.

KoalaGeo commented 1 year ago

@liangsteve @chris-little is there scope for an "Official" OGC interoperability experiment/sprint or similar to look at definitive mappings between SensorThingsAPI & EDR?

@ghobona @hylkevds @KathiSchleidt @m-burgoyne for info

chris-little commented 1 year ago

@KoalaGeo @liangsteve I suppose there is scope, though spare effort is scarce. The current EDR focus is on V1.1 (supporting POST/GET, and custom and categorical dimensions) and the V1.2 focus is Pub-Sub, which has overlap with streaming/async and perhapsSTA .

dblodgett-usgs commented 1 year ago

Discussion of this was raised on the HDWG Spring meetings today. We should look for an opportunity to advance the topic. Perhaps stepping back just a bit and looking at it through the lens of a OMS mapping to EDR and STA? https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/373#issuecomment-1568568691

dblodgett-usgs commented 1 year ago

@ksonda -- my read of this is that you were on a good track with your mapping of expansions of location to EDR items and locations. You had said that you would come back to it in your comment above but never did. Do you think you could revisit that comment and refine it based on your latest understanding? I feel like there's a best practice report in this or something. See https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/373#issuecomment-1583611847 for my latest take over in the EDR issue on monitoring networks.

ksonda commented 1 year ago

See my response there https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/373#issuecomment-1584797400 . I know EDR less than STA but I think a deminimis mapping is possible, despite variety in STA implementations.

dblodgett-usgs commented 1 year ago

cool -- so what do you think closure criteria for this issue should look like?

opengeospatial / sensorthings

Concept mapping to EDR #135

EDR Collection metadata endpoints

`/edr/collections/{collectionId}/items` and `/edr/collections/{collectionId}/locations`

opengeospatial / sensorthings

Concept mapping to EDR #135

EDR Collection metadata endpoints

/edr/collections/{collectionId}/items and /edr/collections/{collectionId}/locations

`/edr/collections/{collectionId}/items` and `/edr/collections/{collectionId}/locations`