opengeospatial / ogcapi-environmental-data-retrieval

A Web API that provides a family of lightweight interfaces for accessing Environmental Data resources.
https://ogcapi.ogc.org/edr
54 stars 25 forks source link

EDR API and the `collections` endpoint. #38

Closed dblodgett-usgs closed 4 years ago

dblodgett-usgs commented 4 years ago

(updated to reflect latest estimate of terminology and move proposal to top on 4-9-20)

Proposal: EDR should not use /collections/

EDR should focus on query patterns for distribution of a dataset such as are organized in THREDDS catalogs.

EDR dataset metadata would be included in the landing page or a clearly linked metadata search API that provides cataloging over many EDR endpoints.

/position, /area, /location, would be allowed off the root of an EDR compliant API.

/collections would be reserved for use with API-Features feature collections -- which may be useful to an API that also implements the EDR query patterns but would not be required. See #28 for more on this front.

/groups as explored in https://github.com/opengeospatial/EDR-API-Sprint/issues/14 would work the same only they would be aggregations of EDR datasets rather than EDR collections.

Summary of current situation

At present, we have planned to overload the collections endpoint such that:

/collections/

Would return a list of available EDR collections with links and collection metadata. The premise is that EDR will have a unique collection metadata json-schema.

/collections/{collectionID}

Would return collection metadata for the selected collection.

/collections/{collectionsID}/items

would not be required because an EDR collection may not be accessible as a collection of items.

/collections/{collectionID}/{queryType}...

With one or more allowed queryTypes would be the pattern to access EDR functions.

Consideration 1: collections is already in API Features as feature collections.

There's been some strong arguments to keep the typing of OGC API resources simple. e.g. what you get back when you hit a collection endpoint shouldn't change depending which OGC API you are hitting. Given that collections is already defined in API-Features, by overloading the endpoint we are going against this. A client that thinks it knows what a collection is will hit an EDR API and get something much more than a feature collection metadata.

Consideration 2: collections is a convenient data-user-centric wrapper around useful stuff.

@cmheazel has a great discussion in the API-Common wiki. I'll leave a couple quotes from the wiki here.

There appears to be consensus on /collections/{collectionId}. At least for those resources which are typically distributed as collections. The final part of the path tells you something about the collection.

...

We tend to think of Features as vector data. But at its most fundamental level, a Feature is an empty container. It has neither type, identity, nor properties. So a Feature can grow up to be just about anything. Which is exactly the sort of construct we are looking for.

Consideration 3: collections is used as an analogue for dataset.

In this issue-ending comment, @heidivanparys lays out the case for typing collection with care (e.g. feature collection is a type of collection) and this well-reasoned comment that dataset ≠ (feature) collection.

Additionally, in API-Features Core, an API instance is limited to one and only one dataset which can have one or more collections. I recently poked around this issue in #33 which leads me to want to push for EDR to focus on describing EDR "datasets" and not EDR Collections.

dr-shorthair commented 4 years ago

I suggest going back to look at the ObservationOffering idea from SOS. I really think the same concerns apply here.

dblodgett-usgs commented 4 years ago

Can you expand on what you mean a little and/or give some pointers to more info?

dr-shorthair commented 4 years ago

A single SOS can serve observations with a variety of sensors, observed properties, features of interest, time periods. If you ask for a random combination of these, there is a strong likelihood that you get no results. So ObservationOfferings were conceived as signposts to dense areas in a sparse matrix - i.e. particular combinations of the dimensions where you will most likely find stuff.

dr-shorthair commented 4 years ago

Josh Lieberman was responsible ... In the context of the query, it seemed to serve a similar purpose to 'layer' in WMS, and 'collection' in WFS.

dblodgett-usgs commented 4 years ago

Right -- got it.

I think I see a future (extension of EDR or maybe Features -- potentially just a best practice?) that would create (sampling) feature collections in the ObservationOfferings paradigm. That could use the collections end point and the items in the collection would be non-simple features. Using collections for handling ad-hoc collections of EDR sampling geometries feels right to me but I still want a /locations to hold all the available sampling locations

At this juncture, EDR is really more focused on providing query interfaces to data-cubes with a nice side benefit that the sampling geometries can be saved and/or pre-defined. It just so happens that the API for those saved sampling geometries is a simple core spec for real-world sampling so we are kind fo tagging that use case on here with an eye on the future.

tervo commented 4 years ago

Reserving collections to handle (possibly ad-hoc) collections of sampled geometries makes sense to me.

But a short notation to /locations. You wrote:

[...] but I still want a /locations to hold all the available sampling locations

In many cases there are arbitrary many available locations. From data cubes, you may sample from any location with any accuracy.

dblodgett-usgs commented 4 years ago

The idea with /locations and possible using /collections/{id}/items is not to require all possible sampling features. It's a convenience to allow exposure of them however an API wants to. I could see it used for pre-defined locations, monitoring stations, caching / saving user submitted geometries, etc.

cmheazel commented 4 years ago

Some definitions from API Features: Dataset: collection of data Feature Collection: a set of features from a dataset

Using substitution: a Feature Collection is a set of features from a collection of data.

We can argue that the terms "set" and "collection" are synonymous And that "Feature" is a subset of "data" So a Feature Collection is a subset of a collection of data where the elements of that subset are all Features.

What then is a collection? How about:

"A body of resources that belong or are used together. An aggregate, set, or group of related resources." (API-Common Part 2: Collections)

This definition is derived from Websters' definitions for collection, set, and aggregate. The items in the collection are untyped. If they are data, then we have a dataset (a collection of data). You can also have a collection of styles (is a style data?), or processes, or anything else you may want to collect. "/collections" is where you go to find out about the collections available from this API. Note that "/collections" returns metadata. It does not specify nor assume a type for the items in the collections. Nor does it require that the collections all be of the same type.

dblodgett-usgs commented 4 years ago

What's your point?

Are you arguing that EDR should use the collections endpoint as a cataloging mechanism for EDR datasets or do you support the proposal that I put forward -- that EDR should not take on the cataloging use case but, rather, focus efforts on representation of a single EDR dataset?

chris-little commented 4 years ago

Definitions of Collection is not a new issue. From an OGC 1999 document:

"Much fundamental work on Feature Collection is needed. What are the fundamental classes and subclasses of Feature Collection. How do they behave? What are the relations between them and Features, Catalogs, Metadata, Schema, and other objects, and between themselves? What are the essential temporal and spatial behaviors of Feature Collections?" Abstract Specification Topic 10: Feature Collections, Version 4, OGC 99-110 (still current)

cmheazel commented 4 years ago

@dblodgett-usgs Neither. Just trying to lay out come core concepts. What is a collection? What is a dataset? How do they relate to each other? Without a common understanding of these terms, we will continue to talk past each other.

cmheazel commented 4 years ago

@chris-little 99-110 is about Feature Collections. But not all resources exposed through OGC APIs are Features. So not all collections are Feature Collections. I think we will make better progress, and have more coherent discussions, if we acknowledge that Collection and Feature Collection are two separate concepts.

cmheazel commented 4 years ago

@dblodgett-usgs Why not do both? If you are exposing one dataset then branch it directly off of the landing page. If you have more than one, then use the /collections construct. Are we making this harder than it has to be?

dblodgett-usgs commented 4 years ago

Thanks, Chuck. I agree that we need common definitions. When we are talking about URL path semantics, definitions are nuanced beyond dictionary deffs with the social and web-engineering context.

What we are talking about here is the "collections endpoint" NOT a generic definition or concept of "collections".

re:

Why not do both?

Fair question. I think doing both adds a lot of complexity and re-invents a wheel that is being established elsewhere. e.g. OGC-API Records, various EO-related cataloging efforts, etc. I think from a best practice point of view, the argument is that the EDR API should have a single-responsibility, provide access to an EDR Dataset. Providing cataloging for EDR Datasets should be the responsibility of another OGC-API standard (records?).

chris-little commented 4 years ago

@dblodgett-usgs But there is a requirement to group EDR 'datasets'. A typical production dataset of an NWP or Climate model , whether meteorologiocal or oceanographic, will have multiple vertical coordinates. We have agreed that these be split apart to make EDR datasets, each of which has one consistent (4D) CRS. There is a case that these be grouped for EDR purposes, because they are closely tied, they are the same production dataset. This may be too detailed for some catalogues or not efficient enough, or too messy to implement in a production environment.

The workflow we have been using is: NWP Model Dump -> several EDR datasets -> one collection/group

A user wishing to compare several forecasts from different providers, or different times, or different experiments, needs to interrogate in turn several collections. Hence collection of collections, or group of groups, needed.

I propose that we leave this functionality in the current version of the EDR API to minimise the dependencies on other OGC APIs. When we have cross-consulted the other API SWGs and they have demonstrated that their implementations work and have the functionality that we require, we can then simplify the EDR API and take the collection/group function out.

dblodgett-usgs commented 4 years ago

@chris-little I'm all for having a grouping / nested group capability. However, given that what we are grouping are not feature collections, I don't think they should be grouped under a collections endpoint. The current OGC API baseline doesn't have typed collections and I don't feel comfortable with them.

Let me dive into the argument a little bit... sorry this is kinda long. As soon as you add type as a potential-variable characteristic to an endpoint you have to carry it around explicitly all over the place -- something we know people won't do.

I'm thinking of use cases like displaying the list of collections an OGC-API instance that conforms to Features-core provides. (example)

If you have typed collections, that list has to be nuanced by some formal type -- more complicated than media-type. We don't have that ability now and adding it is complexity we don't need. The other option is pre-fetching and introspection of hypermedia -- something we also know people just don't do. The alternative here is hard coding things specific to API instances -- a pure anti-pattern when it comes to reusability.

I appreciate that the debate on this is still in full swing and can agree to leaving the collections endpoint in EDR to support nesting EDR datasets (as items) but think that the EDR query patterns should be exposed either 1) at the root of an API or 2) under a different endpoint -- maybe sample to convey the action the EDR endpoint enables.

dr-shorthair commented 4 years ago

Again, I urge you to look back at the ObservationOffering aspect of SOS - which is the logical predecessor to EDR. OO grouped sensors/observableProperties/features-of-interest (i.e. locations) to guide the user to well-populated areas of the data space, so they don't get frustrated querying for data that doesn't exist. I think the concerns addressed in this issue related to the same principle.

dblodgett-usgs commented 4 years ago

This issue is about reserving the collections for feature collections. I'm pushing back on the idea that an EDR API would distribute multiple unrelated datasets by dumping them all in separate "collections".

dr-shorthair commented 4 years ago

Indeed. Collections should make sense, by being homogeneous on at least one dimension. That's what is currently being proposed for 'ObservationCollection` class in the O&M revision - https://github.com/opengeospatial/om-swg

chris-little commented 4 years ago

@dblodgett-usgs Perhaps we need to Pull in to our standard the update to API-Common, Part 1: Core then look at what is now in API-Common, Part 2: Collections ?

dblodgett-usgs commented 4 years ago

Yes -- I think we should look at how we are going to incorporate Part 2, collections.

The question remains, will we use Part 2 and limit our use of collections to feature collections?

e.g. Option 1:

/collections/{collectionid}/items/{id}
/collections/{collectionid}/area
/collections/{collectionid}/position
/collections/{collectionid}/radius
/collections/{collectionid}/Parameters
...

Where an EDR-API has a catalog of EDR Resource Collections and introduces a collection itemType other than Feature?

or option 2:

/collections/{collectionid}/items/{id}
/area
/position
/radius
/Parameters
/group/{memberid}/area
/group/{memberid}/Parameters
/group/{memberid}/collections/{collectionid}/items/{id}
...

Where we use the literal collections path element only for collections of features and other literal path elements for consistently typed sets.

dblodgett-usgs commented 4 years ago

As discussed on https://github.com/opengeospatial/Environmental-Data-Retrieval-API/wiki/2020-05-21, there is a third option -- or perhaps a refinement of 2. The EDR query pattern ends up in the {collectionid} URL path.

/area
/position
/radius
/collections/area/items/{id}
/collections/position/items/{id}
/group/{memberid}/area
/group/{memberid}/collections/{collectionid}/items/{id}

Need to follow up with API Common and others.

dblodgett-usgs commented 4 years ago

Based on the outcome of https://github.com/opengeospatial/oapi_common/issues/140 (https://github.com/opengeospatial/oapi_common/issues/140#issuecomment-637664012 is especially relevant) keeping the current approach to collections is probably going to be best.

Given that

  1. Features defines the most granular spatial data resource at the collection level (that is, there is no spatial metadata at the landing page level)
  2. We do not want to proliferate dataset metadata across multiple access APIs as was done in W*S

We need to take on the complexity of listing available spatial data resources at the collection level.

These slides are being used to document the complete logic for this such that the community can move forward using this scheme with a common understanding of why.

Above all else, we are agreeing that a collection is:

"A geospatial data resource that may be available as one or more sub-resource distributions that conform to one or more OGC API standards."

and that:

Any OGC API that uses the /collection path should define their resource as a representation of a collection of geospatial data.

So in the context of EDR:

An EDR resource is a collection of spatiotemporal data that can be sampled using OGC-API Environmental Data Resources query patterns.

dblodgett-usgs commented 4 years ago

On https://github.com/opengeospatial/Environmental-Data-Retrieval-API/wiki/2020-06-04 we agreed that this issue can be closed and we can move forward with:

/collections/{collectionid}/items/{id}
/collections/{collectionid}/area
/collections/{collectionid}/position
/collections/{collectionid}/radius
/collections/{collectionid}/Parameters
...

As long as we implement some changes implied in: https://github.com/opengeospatial/oapi_common/issues/140#issuecomment-637664012

71 and #72 are follow up issues that need discussion.

We need to add the definition of an EDR resource I gave above to the spec, which I will do with a commit that closes this issue.