opengeospatial / ogcapi-environmental-data-retrieval

A Web API that provides a family of lightweight interfaces for accessing Environmental Data resources.
https://ogcapi.ogc.org/edr
Other
58 stars 26 forks source link

rename /items query to /identifier #28

Closed m-burgoyne closed 4 years ago

m-burgoyne commented 4 years ago

The concept behind adding /items was to to provide an approach to adding a query to support using a location identifier to select data rather than a coordinate definition. To avoid confusion with the WFS /items query where each feature has a unique item_id I suggest that this end point is renamed to /identifier and the identifier_id is a unique identifier for the location.

dblodgett-usgs commented 4 years ago

This goes counter to what we've been working on in #12 and in the sprint. I feel strongly that EDR Items should be compatible with OGC-API Features. (Note WFS terminology is only applicable to the older OWS services)

I would rather say that the EDR geoJSON Items schema have a uri meant to contain a unique identifier for an EDR item. That would allow queries like: /collection/{collectionid}/items?uri=http://feature_identifier

chris-little commented 4 years ago

@dblodgett-usgs If Items in EDR API is API-Features compatible, why not use the Features API to retrieve the items? Then we leave the EDR API to be a coordinate based query API.

dblodgett-usgs commented 4 years ago

That's what I'm proposing. But the items would be EDR-features -- which extends API Features schema to be GeoJSON with particular properties that can be used as hypermedia.

chris-little commented 4 years ago

@dblodgett-usgs @m-burgoyne @tomkralidis Then I think your proposal should be an extension of API-Features. I see no reason to complicate the EDR, and perhaps confuse the users, by having EDR-Features. I agree that we need a name for the Point, Timeseries, Trajectory, Polygon, ..., components of the EDR API, but these are Discrete Sampling Geometries, or 'shapes' or 'data patterns', retrieving from a relatively persistent and dense (not sparse) data store. I think the name 'features' is too confusing.

I would rather keep the scope tight and deliver quickly for others to explore and experiment with.

dblodgett-usgs commented 4 years ago

It would be a real shame to lose site-based environmental data from the scope of EDR. This is going against my understanding of our intended scope here, so let me make sure I follow the proposal.

We are saying we might have urls like:
/collections/col-1234/identifier/
Which would return available sampling geometries if they are identifiable.

Rather than: /collections/col-1234/items/
Which would return the same as above.

I feel like introducing a new API pattern, /identifier, for sampling features adds complexity in that we are introducing a new way of handling collections of features.

If you want to simplify EDR even further and say that these sampling geometries are not identifiable beyond the scope of a given API call then dropping discussion of items or identifier all together would be appropriate

On the other hand, going by Jeff Yutzler's point that the type of thing that comes back from a given API URI key (collections was the subject of what I'm referring to) should be consistent, we are proposing something slightly different from API-Features and having different API URI patterns would make that clear.

So what's the logic / design pattern you guys want to use here @m-burgoyne and @chris-little ? I can get on board with switching to identifier rather than items but we need to be clear what the reasoning is and make sure we can justify adding diversity for those who would rather see closer alignment of the API pattern.

p.s. @m-burgoyne shouldn't it actually be identifers ? A given EDR collection will potentially have many pre-defined identifiers.

m-burgoyne commented 4 years ago

@dblodgett-usgs I agree it should be identifers but I don't think I am doing a good job of explaining the idea and the difference from the WFS items query. The identifier would be a label for a location, that label could be a name (i.e. London) or a station id (i.e. 03772) or even something like a GeoHash but it is not indended as a reference to an individual feature but as a shorthand to the geospatial information required for the query. The query would still allow the requester to subset the data available from the underlying store by time, parameters etc but the the geospatial part of the query is already predefined by the list of available identifers.

dblodgett-usgs commented 4 years ago

Before we go any further, can we please be more precise here? WFS does not have an items query. OGC-API Features does. 🤓

I'm fine with this distinction. Station ids, in the sense of monitoring locations, almost always represent a complex system of sensors and other field activities so treating them as simple features tends to miss important details that people inevitably need. Opening up the potential to handle that complexity outside the scope of OGC-API Features sounds good to me. We can still have feature collections as part of a given OGC-API instance but associate the feature collections to identifiers in the EDR side.

Leading a future discussion a bit -- I think we are very likely going to have to reckon with whether or not EDR datasets are collections or something different, but we'll leave that for after the OAB has had their say.

cportele commented 4 years ago

@m-burgoyne - If the identifier is a label for a location, shouldn't it be /collections/col-1234/location/{locationId}, e.g., /collections/col-1234/location/london?

m-burgoyne commented 4 years ago

@cportele - It depends on the decision for the suggestion to rename the query end points. I agree that there could be better descriptions location-id or location-identifier for instance.

dr-shorthair commented 4 years ago

I was trying to make the same point about sites here: https://github.com/opengeospatial/Environmental-Data-Retrieval-API/issues/20#issuecomment-594146963 - they are emphatically not just coordinate locations, they are usually complex features with real identity.

dblodgett-usgs commented 4 years ago

Thanks @dr-shorthair I didn't appreciate the nuance earlier.

I like plain location because it is parallel to point, line, polygon, etc.

dblodgett-usgs commented 4 years ago

@m-burgoyne I see you got this started in #32 but did not remove the "item" query type. Are you still working on that stuff or would you like some help moving this stuff along?

@jkreft-usgs and I worked up a draft JSON-Schema for the spatial representation of these locations here: https://github.com/opengeospatial/EDR-API-Sprint/issues/12 and here https://jkreft-usgs.github.io/edr_site_based/ that should be ready to be incorporated into what you've been workin up. Happy to hack at that unless you have the docs open and want to keep going as the primary editor.

m-burgoyne commented 4 years ago

@dblodgett-usgs, I think /identifier could be complementary to /items so I haven't removed it from the API definition. I intend to put in a fresh pull request to fix an error in the trajectory definition so I could add in the JSON-Schema code at the same time.

dblodgett-usgs commented 4 years ago

Interesting -- based on @chris-little's stance above,

I see no reason to complicate the EDR, and perhaps confuse the users, by having EDR-Features.

I would have expected us to let items be an API-Features concept rooted in feature-collections which have items and EDR to define locations in the sense of monitoring / sampling locations.

tervo commented 4 years ago

That's what I'm proposing. But the items would be EDR-features -- which extends API Features schema to be GeoJSON with particular properties that can be used as hypermedia.

Is the idea here that the very same service could be OGC API Features (Core) conformant and EDR conformant at the same time?

jkreft-usgs commented 4 years ago

@tervo I would say that having EDR be an extension of OGCAPI-Features would be ideal, and could drive adoption. The ecosystem of tools to interact with OGC API Features is rapidly growing. We should be doing our best to fit within that paradigm.

dblodgett-usgs commented 4 years ago

Yep -- some parts of EDR could be extensions of things in Features such that a single OGC API could present the same dataset represented as a collection(s) of features or via EDR query patterns.

tervo commented 4 years ago

I would say that having EDR be an extension of OGCAPI-Features would be ideal, and could drive adoption. The ecosystem of tools to interact with OGC API Features is rapidly growing.

Yes. I very much agree.

Coming back to original question here. Having a different terminology with OGC API Features looks more confusing to me. I see environmental items as items whether they are sampled from a data cube or not.

dblodgett-usgs commented 4 years ago

It's more to do with the pay load than the API path. If a client hits an .../items end point, it expects to get back something that is like the .../items it got elsewhere. In EDR, our sampling features are a little more complicated than simple-features/geojson -- se we could put them behind a different path (/locations or /identifier) such that clients have an easier time figuring out what's what.

Note that this would not preclude implementation of API Features conformant views of the sampling features, but they would first and foremost be just that, API Features conformant views of the features.

tomkralidis commented 4 years ago

+1 to keep as close to OGC API - Features as possible. This will help greatly drive adoption and reuse of code (just like OGC API - Common is doing the same for /, /conformance, and /collections).

dblodgett-usgs commented 4 years ago

@tomkralidis do you have a stance on whether .../collections should always be feature collections and whether .../items should always return (be capable of anyways) simple features?

Using the /items endpoint by overloading it with sampling semantics could cause as much confusion as it helps through adoption and reuse.

tomkralidis commented 4 years ago

@dblodgett-usgs /collections has an itemType property that we could use to delineate accordingly. IMHO for /collections/{collectionId}/items, as long as we can communicate the model (JSON schema in OpenAPI document, then we can safely (enough) say what we are serving to the client?

dblodgett-usgs commented 4 years ago

Yeah -- I see that potential for sure.

I think the issue is that it's a bit of a slippery slope in the big picture. The more complexity (like flexible typing) we put into these end points the less adoptable they become. Some is needed because people expect. Too much becomes a deal breaker.

On this issue in particular, I think we've basically determined that:

At a minimum, there should be an EDR best practice to include any desired sampling geometries in one or more API-Features conformant collections.

In #38 I've opened the discussion of whether the EDR spec should even touch the collections topic or leave that to features and extensions of features. That's a related issue, but since "locations" is already in the draft spec, I think this particular issue is tapped out.

@m-burgoyne do you have more to explore here? A lot of moving parts right now.

geopyue commented 4 years ago

@m-burgoyne @dblodgett-usgs @chris-little As I understand, the statement, "using a location identifier to select data rather than a coordinate definition. ", can be rephrased as using a feature rather than the specific required parameters, i.e. crs and coords. The feature can be represented using a named place or feature id predefined in a feature store. In this regard, It is about the features of interests like /collections/typhoons/items/dianmu in the WHU case, serving a replacement to crs/coords, to be used for the rain data retrieval. @dr-shorthair Features of interests (like in O&M) can be an optional parameter in point, polygon, and trajectory. In other words, the geometric part of the samplingobjects (point, polygon, and trajectory) can reuse existing features, which is user friendly, like using named places instead of coordinates.

Another note is about the vague yet distinct terms: EDR-feature, sampling feature, and OGC-API features. 1) I would prefer to leave items the same as the OGC API Features. I concur with the scope #20 . The definition of EDR-Feature is out of the current scope and will complicate the current API as a kind of profiles for OGC-API Features. 2) In addition, I would propose to rename the terms point and polygon to samplingpoint and samplingpolygon, to avoid the confusion with the public understanding of geometries, since the terms in EDR-API have more semantics than geometric terms.

dblodgett-usgs commented 4 years ago

Thanks @geopyue -- Let me see if I am following your proposal.

1) You would leave /items out of EDR, allowing EDR compliant services to also be compatible with features but not include /items specific conformance classes in EDR? 2) You want paths to use sample... like: /collections/{collectionID}/samplingpoint?... ?

geopyue commented 4 years ago

@dblodgett-usgs 1) Yes. I prefer to leave /items out of EDR. Items are individual resources included in a collection resource. EDR-API "retrieve various common data patterns", where we may argue patterns are items. For example, in an environmental data collection, an item is a sampling subset or data pattern determined by querytypes/samplingmethods. Currently we may agree that items are sampling features. Some may argue later that items are sampling coverages, or sampling processes (if sampling subsets can be represented using samplingmthods from a WPS prespective). 2) Yes. /collections/{collectionID}/point is a little bit hard to digest for new end users. I would prefer to see /collections/{collectionID}/samplingpoint or /collections/{collectionID}/pointsampling, e.g. /collections/{collectionID}/pointsampling?parametername=rainfall&featureofinterestes=*/collections/typhoons/items/dianmu&...

dblodgett-usgs commented 4 years ago

I follow now. Interesting approach -- I worry that we have trouble describing what EDR features of interest exist. Would that be left to the API-Features or an EDR-based extension of it?

geopyue commented 4 years ago

Geometric conception of place and named place are two ways for place consumption. We leave the options for service vendors. The EDR vendors can choose to host a feature store accessible through OGC-API features, where interested features can be used conveniently by end users to interact with EDR. But sure the coordinate way still can be used.

dblodgett-usgs commented 4 years ago

What about things like available parameters and time range? How do the features relate to EDR collections?

dr-shorthair commented 4 years ago

Thanks @geopyue that was also where I was going with https://github.com/opengeospatial/Environmental-Data-Retrieval-API/issues/20#issuecomment-594146963

geopyue commented 4 years ago

@dblodgett-usgs The available parameters and time range are still used as before. We only provide an option for replacing crs and coords. I agree with the comment @dr-shorthair in #20. At the implementation level, features can be transformed into crs and coords to be applied to the original EDR collections.

Here are some examples.

m-burgoyne commented 4 years ago

@geopyue the reason I suggested the /identifiers (currently /locations in the OpenAPI docs) endpoint was to allow the query to be structured as:

http://geos.whu.edu.cn/edr_api/collections/hainan_weather/locations/dianmu?parametername=rainfall&time=2019-06-01T00:00:00Z/2019-09-30T00:00:00Z

Where

`http://geos.whu.edu.cn/edr_api/collections/hainan_weather/locations/ would return a list of location identifiers for the collection and a description of what they are and their extents

m-burgoyne commented 4 years ago

@dblodgett-usgs I do think there is value in having a features core end point in EDR.

I saw the /items EDR endpoint as an EDR profile of the features core specification (i.e. it has a well defined schema for the GeoJSON output), but I think it is essential that it does not extend the behaviour and functionality of features core.

dblodgett-usgs commented 4 years ago

OK, it seems that we are converging on keeping /items as a way to discover existing EDR sampling geometries and keeping the behavior of the /items endpoint we have in EDR 100% compatible with Features core. i.e. The /items endpoint returns a feature collection of EDR features (sampling geometry metadata).

I think this means that we

  1. can add an EDR JSON-Schema that is GeoJSON-compatible, but we
  2. can not have extended functionality like time and parameter filtering on the items where we would expect to get back EDR data rather than a feature collection.

Agreement here?

Where I think we are not quite converging but I see a path is the issue of locations. Let me describe what I think I see as the path and we can go from there. I'm using will and could to try and capture what I think we have agreed to and what I think we might be able to agree to.

  1. EDR will clearly have endpoints for position, area, trajectory, etc. which normally would be identified by coordinates but may also be identifiable by an identifier (from the items endpoint).
  2. EDR could have a /locations endpoint that can only be queried by location identifier.
  3. If a data provider wants to describe collections of locations, they do so using API-Features core feature collections.
  4. Any EDR endpoint that wants to allow access by identifier will have (an) {EDR-query-patter}/items endpoint(s), which would behave as the items endpoint of any API-Features API would.
  5. EDR data for a desired item will be accessed using a {EDR-query-pattern}/items?identifier={item local id}

e.g. (noting this pattern applies to any EDR query type) /position/items would give a feature collection adhering to this schema where we might discover a feature of interest with local id position_1234. These features would be positions available from the EDR dataset. /position/items?identifier=position_1234 or /position/items/position_1234 would give data available from position_1234 and additional query parameters could be used to limit data returned.

In addition to the items endpoint on EDR query patterns, /locations/items would also return a feature collection of locations. These are monitoring locations, or otherwise identifiable locations with nondescript or otherwise abstract geometry. They would necessarily have some representative geometry, but it is not a "sampling geometry" in the case of locations. /locations/items?identifier=loc_1234 would behave the same as the other EDR-Query-Pattern endpoints.

jkreft-usgs commented 4 years ago

I think that @dblodgett-usgs's proposal seems really reasonable

chris-little commented 4 years ago

Agreed at EDR API SWG 11

dblodgett-usgs commented 4 years ago

@m-burgoyne can you close this with a commit when the change has been applied? Or do you want help applying this change?

m-burgoyne commented 4 years ago

@dblodgett-usgs to keep this better aligned with the feature specification wouldn't it better to have the following: /position/items would give a feature collection adhering to schema where we might discover a feature of interest with local id point_1234. These features would be points available from the EDR dataset. /position/items/point_1234 would give data available from point_1234 and additional query parameters could be used to limit data returned. The same approach would apply to /area, /cube, /trajectory and /corridor

dblodgett-usgs commented 4 years ago

I updated my comment above to use up-tp-date EDR query patterns and further describe what my intention was. I think we are in agreement? Or am I missing a nuance that is different?

m-burgoyne commented 4 years ago

The only real difference is passing the identifier_id as a path parameter rather than a query parameter.

dblodgett-usgs commented 4 years ago

Ahh I see -- yeah, they should both work. The path parameter is the internal ID of the features in the feature collection. As long as the feature collection has an attribute, "identifier" that is mapped onto the feature ID, then this:
/position/items?identifier=position_1234
is equivalent to: /position/items/position_1234

Realizing my examples above were not quite right still -- will fix.

m-burgoyne commented 4 years ago

I am modifying the documents but looking at the result do we need an /items endpoint after every query type?.

If we modify the EDR GeoJSON schema to add a value to the properties which describes the structure each of the available items in the collection (i.e. cube, trajectory, point, polygon etc) a stand alone /items query will allow users to more easily discover all available items for a collection. Putting it after the query type forces the user to know the shape of the item before they make the query.

It would also be worth adding the output format(s) value that the item will be delivered in to the EDR GeoJSON properties

dblodgett-usgs commented 4 years ago

That's a good question. I think we are getting a little far afield on this issue.

Let's get the existing spec using items as it stands and take up the additional related issues in #38 and possibly an additional issue related to /items being available for each query type vs. one for a whole API?

chris-little commented 4 years ago

Discussed at EDR API SWG number 12.