Collections Discussion - Githubissues

cmheazel commented 4 years ago

This issue attempts to pull the various /collections discussions into a single issue.

dblodgett-usgs commented 4 years ago

(EDIT: 6/4/20) This issue is essentially agreed to. See: https://github.com/opengeospatial/oapi_common/issues/140#issuecomment-637664012

Actions here are still

Work through documenting the outcome in these slides
Triage the open issues and make sure we haven't missed anything.
Open new issues that describe how to implement the outcome here.

Both API-Coverages and API-Environmental Data are in an awkward position pending progression of this discussion. Here's my attempt at summarizing to get us moving.

I think there are two issues at play here, let's call them data-resource and items.

The data-resource issue comes down to:

1) Will the collections end point be a container (catalog) for flexibly typed resources that are ostensibly datasets? 2) Will the collections end point be reserved for collections of things that are ostensibly features?

See #17, #36, #39, #45, #47, #74, #86, #99, #105, #106, #111, #116, #120, #122, #128, #130

The items issue comes down to:

1) Will a consistent approach to metadata for sets of items (at whatever path) and an items API path literal be used? 2) Will an approach to "sets of items" be left to each individual API?

See #45, #80, #82, #87, #83, #107, #110, #128 (and likely others) Broadly, this wiki post by @cmheazel discusses this wiki post.

The status of addressing these issues for now:

Collections and items have been moved to its own specification part so core can move forward: http://docs.opengeospatial.org/DRAFTS/20-024.pdf

In the current (5-21-20) collections spec, the /collections path literal is used for "A body of resources that belong or are used together. An aggregate, set, or group of related resources."

The items path literal is used for: "the individual member resources that make up the collection".

Narrative:

Some are advocating for a flexible definition of collection to allow spatial data resources to be the resources that make up the collections path literal resources. Some spatial data resource types in this approach would use the items path literal but not all.

Others are advocating for strict typing of the collections path literal. Essentially saying that the things you get from a collections resource should all be the same (e.g. feature collections). I don't think there are issues with using items in this approach as long as a different path literal is used high up in the chain.

A path forward:

In follow up comments here, please take care to focus on the issues and seek to find ways to communicate unique characteristics of proposals with clarity. Please make fully-fledge proposals and try to name them such that we can discuss with clarity.

References OGC-API specifications of interest:

There is some discussion of the API-Features approach to collections in the core requirements class.

Coverages discusses their approach in collection access

EDR discusses their approach in environmental resources

Records discusses their approach in collection access

Processes does not use the collection endpoint

There may be other relevant specs (styles?) to consider, but I think these are the core that we need to consider at this juncture.

dblodgett-usgs commented 4 years ago

To give the group something to get started, I want to propose that we 1) do NOT add a collections literal path to API-Common, but rather, 2) include a /{spatialResource} at the root of an OGC API which API-Features /collections would be conformant with. 3) Have a reusable set of conformance classes for items that could be used on any set of resources in an OGC API. API-Features items would also be conformant with this common items.

jerstlouis commented 4 years ago

As recently discussed in #11 and #111, I believe the most controversial aspect of the Collections problem is its generic name which sows confusion, while what we are trying to define here is something better described as OGC API - Common (Geospatial data).

As such, and in line with what both @dblodgett-usgs and @jeffharrison are suggesting, let's entertain the idea that we can drop the literal 'collections' from being fixed, and that a compliant client must instead rely on finding the list of geospatial data resource by following "rel" : "data" from the landing page. As far as compatibility with OGC API - Features is concerned, this would either require that any server offering Features (which may be served along with other APIs) stick to "/collections", or that the Features standard is revised with a breaking change where clients can no longer rely on "rel" : "data" pointing to "/collections". Let's assume we are okay with this for now and continue.

Now "OGC API - Common - Part 2: Geospatial data" could say:

The "rel" : "data" of the "OGC API - Common - Part 1: Core" links to a URL we will call {dataResourcesURL}
A GET request on {dataResourcesURL} will return a list of data resources, with a schema matching what is currently returned by Features as the reponses to /collections (NOTE: This does imply a "collections" : property at the top of that JSON response, there's sadly no way I can think of around that.).
In that list of data resources, as per the OGC API - Features specifications, and the current draft OGC API - Common (Collections) specifications, two key link relations are defined inside the "collections" property array:
- "items" -- This is where you actually retrieve your data (In Features, this links to /collections/{collectionID}/items, in Coverage, that currently links to /collections/{collectionID}/coverage/all I believe. For 3D Tiles, this could point to your tileset.json (linking to one or more node of the BVH). For i3s, this could point to your root BVH node.
- "self" -- In Features, this links to /collections/{collectionID} where you normally retrieve the exact same content of that element for the one collection object of the "collections" array, all by itself. Same here, this links to {dataResourcesURL}/{dataResourceID} with same response.

The Tiles API could also be tied here by having a "tiles" link relation within each element of that array.

I also foresee the need for additional optional conformance classes to be able to arbitrarily retrieve data using bounding boxes and resolution, without the client having to know anything about the type of geospatial data.

akuckartz commented 4 years ago

Maybe some ideas from https://www.w3.org/TR/ldp/#ldpc can be used?

dblodgett-usgs commented 4 years ago

@akuckartz Please note the "A Path Forward" section above.

Can you please flesh out what you think is of value from the Linked Data Platform Containers list?

akuckartz commented 4 years ago

@dblodgett-usgs That same comment ends with "There may be other relevant specs (styles?) to consider, ..." LDP is such a spec - and even a standard.

Can you please flesh out what you think is of value from the Linked Data Platform Containers list?

I will try, but can not guarantee that I find enough time.

dblodgett-usgs commented 4 years ago

I see -- that closing comment was meant to close out the list of OGC API Specifications that have work in progress related to the data-resource and items issues. If you think there are elements of the LDP specification that are useful in bringing closure, please present them here, but that comment was not intended to be an open-ended ask for additional concepts from outside the current baseline.

dblodgett-usgs commented 4 years ago

@jerstlouis -- why carry rel: data forward from Features at all? Couldn't rel: data be an OGC API Features link relation that gets you to a feature collection view of the data-distribution at / ?

Your response does not address the two aspects of the issue as posed in my top-level summary and seems to be mixing thing up.

Re: the "data-resource" issue, which seems to be the one you are addressing, you are clearly interested in collections being a container for flexibly typed resources that are datasets. But your examples just describe a design pattern -- not why it is better than another design pattern.

Your:

(NOTE: This does imply a "collections" : property at the top of that JSON response, there's sadly no way I can think of around that.).

Indicates to me that it is a bit of a hack and is probably not a solution that we want to pursue.

Specifically, what is wrong with:

/ as a data-resource
/collections as a feature-collection spatial-resource
/coverages as a coverage spatial-resource
/tiles as a tiles spatial-resource
etc. etc.

Where API-Common would specify the common aspects of a spatial-resource but not the path semantics -- only that a given spatial-resource view over a data-resource should have a literal path for its API.

jeffharrison commented 4 years ago

I don't think there's anything wrong with it. There should be room in OGC API for straightforward geospatial resources too.

Best Regards, Jeff

cportele commented 4 years ago

In general, I see two general approaches for moving forward, if the existing Collections resource does not work for the SWGs that specify other spatial data resources.

Let me start with the proposal from @dblodgett-usgs, which I would characterise as follows:

Restrict an API that shares spatial data to a single dataset.
- Note: There will also be APIs with no dataset involved, e.g. to share processing capabilities, styles, etc.
- Note: A metadata catalogue is a dataset, too.
- Note: An API that implements Records and that catalogs data APIs is essentially an API that provides access to multiple datasets.
Group spatial data resources so that they provide reliable access patterns and do not require that code does a detailed analysis of the resources to determine how to access sub-resources. Use a fixed token for the path element.
Each spatial data resource represents one or more distributions of the dataset (one per supported media type).
The same dataset may be shared via multiple spatial data resources ("views") with different access patterns. For example, the data may be shared as features, as a coverage or as vector tiles, each with their own access patterns (sub-resource structures, applicable query parameters, etc.).

That is, /collections would be restricted to cases, where the data items are accessed at /collections/{collectionId}/items with support for paging and filtering via bbox/datetime. Example: features and records. Neither /collections nor an "items" link relation type would be discussed in Common Part 2. Their specification would remain in Features.

For spatial data resources with other access characteristics, other resource types with other tokens should be defined / used in the respective specifications. For example, Tiles, Coverages, EDR.

Note that I don't see how Common Part 2 could define "a reusable set of conformance classes for items that could be used on any set of resources in an OGC API." The items resource is a specific sub-resource of a collection and other spatial data resources may have different access patterns (I assume, for example, coverages).

One aspect that would need more thought are the link relation types. In Features, the "data" link relation type references the Collections resource at /collections. The link relation type is defined as "refers to the root resource of a dataset in an API." Since we cannot register something like "data" with IANA, we need a fresh start in Common anyhow. I see two options:

define a link relation type for each spatial data resource type (e.g., Features could define "ogc:collections"), or
define something general like "ogc:spatial-data-resource" or "ogc:distributions" in Common Part 2.

The second option works, if/since we use fixed tokens like "collections" or "tiles" for the spatial data resource types. However, for clients navigating the API by following links, the first option should be easier to use.

The second approach that I see is to also move away from fixed paths like /collections in Common Part 2, but define a flexible architecture for other resource types that represent dataset distributions. This should work, too, is maybe a cleaner architecture, but also more complex to develop client code.

For simplicity, let's assume that we still restrict an API that shares spatial data to a single dataset. If this changes, the solution would become more complex.

Without fixed paths we need to rely on other mechanisms so that humans and software can understand an API, both from the API definition and from navigating the resources:

For a general solution we would need a richer set of link relation types to distinguish links to different types of data resources - individual data items and aggregations of them with datasets and distributions having a special role. The downside is that we would need to develop a sound resource model and I doubt that we have enough experience yet to standardize one.
Since the starting point is that not all spatial data resources are under /collections, we need to be able to link to other spatial data resources and be able to understand what they are. We probably would need to a) register a JSON media type for each resource type representation in the API and b) should include a "type" member in each JSON object (in the current Collection(s) resource types these members would have to be optional for backwards compatibility). The structure of these resources would be out-of-scope for Common Part 2, which to me also would exclude any discussions about "items".

A potential risk with all that is that the flexibility on the server side comes also at a cost for client developers. Developing a generic client that works out-of-the-box would require good knowledge about all these concepts. (NB: there is also more work for document editors / OGC as more and more IANA registrations would be the likely result.)

One of the key drivers behind the WFS 3.0 / OGC API Features activity, and I hope also behind the OGC API idea in general, was/is to reduce the learning curve and the complexity for developers compared to many of the standards from the OWS/XML stack. Yes, we also want to improve the overall architecture in the OGC baseline in this process, but we should avoid approaches that add complexity/flexibility that is not needed by the majority of the deployed APIs.

If we go down a path with a very flexible resource structure, there should be agreement that OGC API standards (e.g., Features) can remove flexibility for "their" resource types. In Features we ended up with the current structure after implementation feedback and intensive discussions (see, e.g., issues 90, 64 and others) and that approach has proven to work well for Features.

dblodgett-usgs commented 4 years ago

Thanks for this @cportele.

I am admittedly out on a limb with the items idea. I was thinking that the link relation for items and at least limit/paging could be reused in other places that are not under a collections API path literal?

The reason I'm leaning toward an approach where each API access pattern gets its own literal path is largely what you point to as a key driver for OGC API.

... to reduce the learning curve and the complexity for developers compared to many of the standards from the OWS/XML stack.

Your notes are really important to what I'm seeing as a path forward:

Note: A metadata catalogue is a dataset, too. Note: An API that implements Records and that catalogs data APIs is essentially an API that provides access to multiple datasets.

In this world view, an OGC API can be cataloged in an OGC API Records and referenced as a dataset in its own right. I've used ISO19139 (services metadata) to integrate dataset services into processing workflows very successfully and see this as putting the complexity in the right place but keeping it "in band".

Where are others at on this? @cmheazel @joanma747 What would you suggest as a path forward? I am pushing here because of how much work is bound up in EDR and Coverages pending this discussion. Coverages and EDR folks: @Schpidi @pebau @chris-little @m-burgoyne where do you stand on this?

jerstlouis commented 4 years ago

@cportele @dblodgett-usgs @cmheazel @joanma747 What I was hoping to see in OGC API - Common Part 2: Geospatial data is the following...

A common mechanism (regardless of the type of geospatial data or available views) by which to list all data layers within a dataset (starting from its landing page), including common information useful to a client, using a Common schema (though it can be extended for the specific module). This includes:
- Identifier
- Title
- Spatial & Temporal extent
- Intended scale / resolution

/collections/{collectionID} in both OGC API - Features and the current draft of OGC API - Coverage satisfies this for the most part.

A common mechanism by which links are provided within this schema for each of these data layers, which return the data in one or more forms in which it is being distributed. Based on the link properties ("rel", "type" and/or other properties), a client knows what it will get when it follows that link.

Currently, the "rel" : "items" of OGC API - Features, also used in the current draft of OGC API - Coverage, linking respectively to /collections/{collectionID}/items and /collections/{collectionID}/coverage/all also satisfies this.

Relations could be changed as needed, additional properties for the links could be added, but this is the functionality I hope ends up in this Common approach to Geospatial data.

Then I also hope for conformance classes supporting a simple retrieval mechanism from BBOX+resolution to retrieve the data, either from that same link relation, or from a separate "rel" if necessary. /collections/{collectionID}/items?bbox=30,40,50,60 -- Returns me a GeoJSON for my bbox for a vector layer /collections/{collectionID}/coverage/all?bbox=30,40,50,60 -- Returns a CoverageJSON for my bbox for my raster layer /collections/{collectionID}/coverage/all?bbox=30,40,50,60&f=geotiff -- Returns a GeoTIFF for my bbox for my raster layer

Similarly, rather than BBOX+resolution, one might use the Tiles API instead in a consistent manner, for retrieving the data either as vector and/or raster. And one might use the Maps API to render that data in a consistent way, and one might refer to this data layer the same way as an input to a Process.

These are the use cases I care the most about, and so far the newer proposals seem to move away from this and I see it as a major setback in terms of having a common approach to geospatial data.

If we can resolve this, then we could discuss about how one might represent a hierarchical structure both within a single dataset, and as a way to organize multiple datasets, and whether that capability could be one and the same, or implemented in a similar manner, but that is largely a separate issue.

My rationale for wishing to have this functionality is also entirely based on reducing the learning curve and the complexity for developers. By implementing these simple capability once, clients automatically handle the generic aspect of working with any type of geospatial data, and can gradually implement additional support for the special handling or capabilities specific to a particular data type or retrieval mechanism.

As a practical example of the value of this, based on the current draft Coverage specifications, the only thing currently missing in our Features & Tiles API client from supporting Coverages is parsing CoverageJSON, because the current generic common geospatial data approach already allows it to follow the links all the way to /collections/{collectionID}/coverage/all which returns the data as CoverageJSON. Without writing any special code for Coverage, it could already see the titled coverages and their geospatial and temporal extents.

cportele commented 4 years ago

@jerstlouis - You have lost me now. I thought you wanted to get rid of /collections as a root resource for dataset distributions, but now you seem to say that every data API should use /collections?

jerstlouis commented 4 years ago

@cportele I never wanted to get rid of /collections, but because I thought the name collection in the path was the source of all this controversy, I suggested in a previous post that if we could figure out a way for the published Features standard to relax that 'collections' literal in /collections, and understand /collections to be wherever the landing page "rel" : "data" point to, then it might be easier to move forward. However you seem to indicate that Features would like to remain restrictive in this regard, which would at least imply that the literal 'collections' must remain if a dataset contains at least one Features data layer.

In my last post, you could substitute /collections to /roses, with the exception that currently OGC API - Features, Coverage and Common (Collections) draft all prescribe /collections at the moment.

The dataset / hierarchy discussion is separate and I was trying to avoid it until the most fundamental aspects are settled (i.e. points 1 & 2 which currently work with current draft specs). In an ideal world, I would combine the datasets/collections landing page, collections, and 'collection resource' to a single schema. Then such a resource could have links to data representations/views at the current level, links to sub-datasets and/or links to sub-collections, And you would have an indicator saying whether a particular resource constitutes a dataset per the DCAT definition. A service could have a higher up service landing page with service info, but not representing any specific datasets, linking to "data" (the root hierarchy for datasets and collections) and "processes". There could be links to api and conformance at whichever level(s) it makes sense. That root data resource being "/collections" would have been the easiest way to be compatible with Features as it is currently specified.

cportele commented 4 years ago

@jerstlouis - I don't see Features moving away from /collections as the current approach works plus changing it would be a breaking change.

I also don't think it is the name; if the resource definition (contents, sub-resources, parameters, etc) would work for other data items, the name shouldn't be a real issue.

Also note that in the current drafts we already have dataset distributions that are not under /collections like /tiles, which has been the idea from early on.

To move forward on this issue, I think we need broader input, e.g. from those mentioned by @dblodgett-usgs.

jerstlouis commented 4 years ago

@cportele /tiles at the root of the dataset works for tiles containing all data layers, but we also have /tiles inside each {collectionID} to retrieve tiled layers individually. Also a service may serve both the raw data tiles, or may want rendered map tiles, which should be distinguished.

I agree that the name shouldn't be a real issue, but I believe for some it is the main issue (e.g. see https://github.com/opengeospatial/oapi_common/issues/11#issuecomment-633266619 , and contrast that with Jeff's previous comments.).

jeffharrison commented 4 years ago

Uhh, what I said was -> OGC shouldn't mandate the use of the term 'collections' as the identifier for all geospatial resources. But at this point in the OGC API development process it's reasonable for OGC to say the identifier of a {geospatialResource} could be "/collections/{collectionId}" or a coverage or another geospatialResource.

Best Regards, Jeff

jerstlouis commented 4 years ago

@jeffharrison is it the term 'collection' that you have an issue with, or the idea of a common approach to geospatial data consistent across different APIs (common way to get from a landing page to your data layers, which has e.g. a spatiotemporal extent / volume, and links to resources to access that data, e.g. features items, coverage, bounding volume hierarchy tileset for 3D data)?

In that comment I linked on issue 11 you seemed to welcome that proposal without the term 'collection'.

dblodgett-usgs commented 4 years ago

Thanks @cportele. We really do need input from others here. So far, most of the discussion between @jerstlouis and others has been talking past one another without some shared use cases and assumptions to root the discussion in.

I attempted to provide some focus in my opening comment: https://github.com/opengeospatial/oapi_common/issues/140#issuecomment-632190005 and we need to focus this and iterate toward consensus rather than continue to air old arguments.

chris-little commented 4 years ago

@dblodgett-usgs @cportele @jeffharrison @jerstlouis @joanma747 To be honest, I am getting lost in all this. In EDR, we support a few queries 'sampling' against a single geospatial resource. We would like a common OGC API mechanism to identify the resources that fall within the client's query's spatio-temporal bounds of interest.

I think that the OGC API Common Part 1 can do this, as can Part 2 Collections, and probably Records.

Grouping of several resources quite tightly is desirable (e.g. all the Météo-France forecasts for today at a certain resolution, both upper air and surface), as are more loosely coupled groups (e.g. all forecasts and observation datasets for NW Europe, at differing resolutions, from Latvia to Portugal, issued on 13 October 1987)

There are some use cases for compatibility with OGC API - Features collections/collectionId/items.

"Layers" do not make sense to EDR, as a single datastore resource may have 10 million "layers", each of which could be MBs or even a GB in size.

I am not sure that this gives you a clear direction.

dblodgett-usgs commented 4 years ago

I'm doing my best to remain neutral but also push people on the issues and try to focus this discussion. I want to bring some comments from https://github.com/opengeospatial/ogc_api_coverages/issues/65 over here.

Thus far, the discussion is focusing heavily on the nature of the /collections end point and not really concerned with the less contentious issue of a consistent approach to items.

@jerstlouis offers a helpful set of benefits for treating the /collections end point as a dataset catalog in https://github.com/opengeospatial/ogc_api_coverages/issues/65#issuecomment-636194296-- providing justification for answering yes to the question I posed at the outset of this issue:

Will the collections end point be a container (catalog) for flexibly typed resources that are ostensibly datasets?

excerpting @jerstlouis:

We are trying to represent a specific "leaf" (most granular) data entity, regardless of its data type, at a single end-point. ...

We are also enabling to list all such leaf data entities at the same level, e.g. to list all of them part of a single dataset.

We are providing a generic manner by which to query the description of such an entity, e.g. its spatio-temporal extent, or to retrieve a list of these entity descriptions.

I find this very helpful for the following reason:

API-Features specifies that an API is for one and only one dataset, but
the only place to get spatial metadata is at the collection level.
So, while Features may say it is for one dataset, it is set up to represent 1:n spatial data entities (feature-collections).
If we are going to have parity in "leaf" spatial data entities (collections), then adding additional access methods for a collection is totally logical.

OK, so running with this a bit, what is a collection?

I think @pvretano offers some good words over in https://github.com/opengeospatial/ogc_api_coverages/issues/65#issuecomment-636207840.

In my simple-minded view of coverages, I see them as a collection of measurements (samples) taken with reference to some subdivision/tessellation of some object space that is somehow geo-located.

@pvretano, your attempt at self deprecation isn't working on me. I know you are way ahead of us. ;)

I find this idea of "a collection of measurements (samples)" to be the profound bit.

In API Features, API Coverages, and API Environmental Data, we are all circling around this notion of accessing a digital representation of the world, potentially bounded to some spatial domain. In Features, the representation is entities we have identified and want to share for whatever reason. In Coverages, the representation is a tessellation that, in an ideal world, approaches the continuum it is sampling. EDR accepts (cynically?) that people don't really care about features and coverages, and just want to ask what the dataset's estimate of the value of the real-world is for a location, point, area, trajectory, etc.

So is that what a collection is? A spatially bounded collection of samples of a real world phenomena that (depending on the nature of the samples) can be accessed via a variety of APIs?

One other interesting comment before I call out some others and look for a way forward.

@tomkralidis says:

whether we go with /collections, or /coverages, /tiles etc., have the respective collectionInfo.yaml inherit from a generic collection content model (from what would become Common Part 2). Or maybe even an OGC API - Records record model?

I want to call attention to: "Or maybe even an OGC API - Records record model? because, if we are going to go down this road, we must define the relationship (it can be flexible) between a collections and datasets that are going to show up in API Records. Elsewhere in @tomkralidis' comment, he points out that "this would also help servers provide "on board" catalogues of the data they serve pretty easily". The question that might get people thinking is: "Is there a cross walk between collection metadata and DCAT?!?"

Now -- let's assume we go with /collections and use rel: data to get you there.

How do we fix the issue that you have to parse a bunch of garbage you might not care about and find the stuff you do care about / have client code to deal with? @jyutzler described it over here: https://github.com/opengeospatial/oapi_common/issues/47#issuecomment-598734879 Some have suggested a "collectionType" enum but that's gotten quick push back with counter suggestion of an "accessTypes" array. but I don't think that goes quite far enough.

There is a strong desire to minimize the diversity of functionality that exists at a given API path. Is there a middle-way here? Can we define common collection info that sets us up to allow diversity without introducing undue complexity when implementing general client code?

How do we bridge the gap between the advanced geospatial perspective where we have these abstract hierarchical datasets made up of collections with varied access patterns and a non-geospatial web developer who just needs to get their client or server code to work and be conformant?

I want to suggest that the path forward

must be based on simple, use-case-oriented, building blocks that happen to fit together into a coherent (and complex) whole,
must include tight and simple definitions of things like collections that provide clarity rather than convey generality.
must have a clear architecture for dataset-catalogs, data-distribution, processing, and integrated representations (maps etc.) (or whatever this taxonomy should actually be).

If we can define the initial building blocks in the APIs that are in motion (including Common), get our shared definitions right, and define this architecture in common, I think we can move forward. But we must stop talking past each other and seeking to understand other's requirements and find common ground.

At this point, I'm curious where @jyutzler and @cmheazel are at on the issues.

KathiSchleidt commented 4 years ago

Hi Dave,

many thanks for having done the painful work of collecting all these insights into the collection conundrum!!!

what I'm seeing is two worlds colliding:

the spatial world, focus is the geometry, a few attributes added for context. Also seems to have a fairly clear dataset concept, aligned with a layer, aligned with a velum overlay as a map layer
the data world, focus on the data, a bit of geometry added for context. No clear dataset concept as this is in the eye of the beholder, whatever bits make sense in a thematic context.

Trying to force data from the 2nd world into the simple clean concepts stemming from the first doesn't seem to be working, the reason we have our SensorThings (STA) and to my understanding the background of EDR.

In STA, we at least have a real-world-object to ground our observations (I'm just missing how to link our observational concepts to these items).
In EDR, the spatial object the resulting dataset pertains to doesn't exist until the EDR query has been sent.

Taking this a step further, I see many cases where the provision of the spatial (1) vs. data (2) aspects are performed by different organizations or institutions, thus firming up the requirements on being able to link data on a spatial object (or area to also support EDR) from one source with spatial information from a different source. Related to this is the requirement to 'represent a specific "leaf" (most granular) data entity, regardless of its data type, at a single end-point.' While this sounds very good, to my view (S)ELFIE has put up clear requirements to the contrary, at least when it gets to a real-world-object. We're also encountering issues when using multiple OGC Standards, what is the 'single end-point' for a data object being provided by both OAF and STA?

Sorry, no solutions, just the concern that by ignoring the dichotomies engendered by the 2 worlds described above, we will continue to come up short of real world requirements. The modern spatial world requires more information on spatial objects than can be provided by an integrated set of attributes!

My 2 cents

:)

Kathi

jerstlouis commented 4 years ago

@KathiSchleidt @dblodgett-usgs

In an attempt to bridge these two worlds, I would like to clarify what I meant by this "leaf (most granular) data entity" concept.

Leaf / most-granular might have been an overstatement, as e.g. you could split a FeatureCollection into individual features, polygons, points. Similarly you could split a coverage in its individual grid cells or samples. In the context of IoT / sensors, each invidual sensor may provide one or more measurements/observations, and the sensor itself is positioned at a point in space, and the measurement/observation is captured at a given point in time.

So what I was picturing as the "leaf data layer" in the case of sensor data, is not the individual sensor or its measurements, but a collection of mutiple sensors, along with their geospatial and temporal aspects.

Potentially, a single SensorThings API could be the source of one or more such data layers, or multiple SensorThings API could be sourced to provide one or more integrated "leaf data layer(s)" (e.g. based on the thematic context). Each of these data layer could then additionally be offered as either or both Feature Collections and Coverages, to facilitate the use of this information in GIS tools without built-in support for SensorThings API. When one SensorThings API maps directly to one such data layers, or when describing the SensorThings API itself, the spatio-temporal extent for it would be the overall extent of the temporal and geospatial coordinates for all measurements provided by that API.

KathiSchleidt commented 4 years ago

@jerstlouis thanks for this clarification! following up - how would you see the various classes of STA? All as one collection (so a wild mix of Things, Sensors, ObservableProperties...) or as a collection per class type? (this is where I always get lost). Adding a Sensor Collection, especially if this already includes "their geospatial and temporal aspects", starts out seeming pretty straightforward. The tricky bits show up a bit later:

How do you deal with moving sensors?
Are ObservableProperties a collection on their own, or do they get denormalized into their sensors?
???

I'd much appreciate a simple sketch of how to bring this fairly simple STA world into OAF

jerstlouis commented 4 years ago

@KathiSchleidt I believe an overall SensorThings API would best map to a single collection, at least based on my limited familiarity with STA so far and your brief description (and a glance at https://docs.opengeospatial.org/is/15-078r6/15-078r6.html#24). It would also be possible for a SensorThings API to map to multiple collections, but each of these collections would likely map to some thematic regrouping of sensors along with their associated Things, ObservableProperties, etc., rather than each of those aspects of the SensorThings APIs being separate collections. But each of these collections could also stand on their own as individual SensorThings APIs.

By class type, am I correct in understanding that you were referring to SensorThings conformance classes? In other OGC API specifications, such as Features and Coverage, conformance classes describe different capabilities of the API, which applies to the multiple available collections.

Moving sensors -- each set of observation is taken at certain time, and the difference with non-moving sensors is that the geospatial coordinates changes along with the time. A collection of sensor measurements/observations still has an overall spatio-temporal extent. The Moving Features standard should also be considered, and it would be interesting to see how this can integrate with the Features API.

I don't think Observable properties would be a collection on their own. e.g. if one was to create a feature collection out of information coming from a SensorThings API, observable properties become the associated data attributes (properties), while the geospatial coordinates of the sensor become geometry points, and the time of the observation becomes the temporal aspect (which may also be stored as a property of the point feature).

If one creates a coverage out of information coming from SensorThings API, then again the sensor position becomes the coordinates of the coverage sample, the measurement/observation is the value (sample) at that position, while time is an additional dimension of the coverage, and separate types of measurements can either be represented on separate planes (extra dimension?) or by splitting it into separate coverages.

So the idea of how to regroup this sensor information would be to have the possibility to present this dataset of observations/measurements, which could potentially be retrieved using one or more SensorThings API, as one or more features collection, and/or as one or more coverages.

I don't really believe that these worlds are that far apart, because people have been building GIS vector and raster datasets from measurements and observations for a long time. The only difference with SensorThings API and the IoT is a lot more information is available and it is real-time. But I don't think this prevents the representation of the information as classic Features collections and/or Coverages. However it presents some additional challenges due to that greater quantity and flow of information, and I think space partitioning mechanisms and dynamic distributed processing are key tools to solve those challenges.

jerstlouis commented 4 years ago

It would be good to hear @liangsteve's and @sarasaeedi 's perspective on the above :)

KathiSchleidt commented 4 years ago

@jerstlouis - fear we're again running up at the divide between

the spatial world
the data world

as described above. From your description of Sensor and Coverage data, I'm getting the impression that in your perspective, the data is reduced to an attribute of the spatial object, not something being described in its own right (question of what's the first class citizen, the spatial or the data).

This leads to errors such as conflating the sensor with the object of measurement, fine for a simple system where you just want to indicate something by the color of a polygon, doesn't work for more complex use cases such as environmental reporting. There are solid reasons behind separating:

Sensor: the device measuring a property of a spatial object
Thing: the spatial object the Sensor is connected to
FeatureOfInterest: the spatial object the Sensor is measuring on

Putting them all into one collection leaves you with a grab-bag of classes with diverse semantics and structures; am I misinterpreting collections as being too uniform?

hylkevds commented 4 years ago

When Kathi mentions the classes of STA, she refers to the different entity types that STA defines: Location, Thing, HistoricalLocation, Sensor, ObservedProperty, (Multi)Datastream, Observation, FeatureOfInterest. Only Location and FeatureOfInterest hold geospatial data.

The mapping between a STA instance and the dataset and collection concepts is not so simple, and will change depending on who you ask. The server https://airquality-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.0 holds air quality data for all European countries, for 9 different observed properties and thousands of stations. This can be seen as one dataset. Or it could be seen as different data sets, one for each country Or one for each ObservedProperty, or one for each OP/Country combination, or one for each station...

The service at https://lubw-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.1/ contains water quality measurements for water in Baden-Wurttemberg. Measurement "stations" at point locations, with datastreams, sensors, and observations. But the same service also holds the (geo and non-geo) data of all rivers. And the (geo and non-geo) data for all Aquifers. One might say that these three types of data (water quality observations, rivers, aquifers) belong in different datasets and thus in different STA endpoints, but putting them in the same server allows us to query across their boundaries and find rivers that have measurement stations that have observations for a given ObservedProperty. Or find measurement stations that are on side rivers of the Rhine.

I often have the feeling that the whole "collections" concept is only born from a lack of good querying capabilities.

jerstlouis commented 4 years ago

@KathiSchleidt

In both the Features and Coverage API, I am not sure it is fully accurate to say that the data is a second class citizen. e.g. in Features, the geometry of a feature is much like one property along with the other data properties. In Coverage, the 'values' are the measurements themselves, and the spatio-temporal coordinates situate those measurements in space & time.

Now that same service (or its source service from which it derives information) can also be offering a SensorThings API, which has that sensors-oriented data model.

The Thing vs. FeatureOfInterest distinction seems slightly problematic to integrate with Features/Coverage, as there are two different spatial objects associated with the same measurements. I am guessing one of them should be considered as the "primary" geometry.

But if we are looking specifically at how sensors data can also be made available as Features and/or Coverage (in addition to the SensorThings API), I think the guiding principle should be how to best organize that information so that Features/Coverage clients can easily perform analytics and/or visualization with it, rather than trying to directly map the Sensors data model (for which the SensorThings API probably will always work best).

In a general sense, I think the idea is that a collection is this whole set of information, regardless of what the data model behind it is. So anything that could be split out in its own set of information organized per (Sensor, Thing, FeatureOfInterest, Observation, ObservedProperty, Datastream, Location, HistoricalLocation) would be considered a single collection.

jerstlouis commented 4 years ago

@hylkevds Thanks for the insights and clarifying about the entity types.

This can be seen as one dataset. Or it could be seen as different data sets, one for each country Or one for each ObservedProperty, or one for each OP/Country combination, or one for each station...

I think the definition of what constitutes the dataset is based on how and by whom that data is collected and/or integrated and published. A larger integrated dataset facilitates larger scale visualization & analysis.

As for putting together different data sources in the same dataset, on one hand I hope that the OGC API will facilitate integrating and relating data available separately (even from different sources). On the other, a dataset can be made up of multiple collections which complement each other, so that is a very good example use case. The same server could also potentially serve them as different datasets. Some of this goes back to the distinction between collections and datasets, and hierarchies of them, and whether allowing more flexibility in that regard may eventually be useful.

dblodgett-usgs commented 4 years ago

(this comment is superseded by: https://github.com/opengeospatial/oapi_common/issues/140#issuecomment-637664012)

This diversion into sensor things is fascinating and definitely good food for thought, but we need to focus on the issue at hand.

If we are going to use /collections as a container for "geospatial data entities" / sub-datasets other than feature-collections,

how do we clarify what is meant by the term "collection"?
how do we hide the complexity from server and client developers who couldn't be bothered with this collection abstraction?

Proposed solution 1:

As @jerstlouis has proposed, I think renaming API Common Part 2 to:

"OGC API - Common - Part 2: Geospatial Data"

would be a good start.

Proposed solution 2:

The definition of Collection needs to be less obtuse and more evocative for developers.

Current def from 20-024

A body of resources that belong or are used together. An aggregate, set, or group of related resources. (OGC 20-024)

I think what we are saying here is that a (OGC-API) Collection is: "A geospatial data resource that may be available as one or more sub-resource distributions that conform to one or more OGC API standards."

Some statements we can say about an OGC-API Collection:

A collection has spatial-temporal extent.
A collection has one or more representations/distributions available as OGC APIs.
If available as identifiable entities, the collection will implement items consistently.

Perhaps these kinds of basic statements could be included in an informative note in the definitions section?

Proposed solution 3

Introduce an accessTypes array in the collection information schema.

For backwards compatibility with API Features, if accessTypes is omitted, then the collection should provide an itemType of Feature.

This accessTypes would be a key into an OGC API such as coverages or EDR that would provide a distribution of the collection at a path other than items.

Proposed Solution 4

A clean relationship between link relations, path literals, and the accessTypes array needs to be implied.

Since Features uses "items" as the link relation and the path literal, the accessTypes is easy -- items. For others, that path literal would be used for all three. For coverages, coverage. For EDR, position, area, trajectory, etc.

Proposed Solution 5

Each OGC API needs a clear and concise definition of their distribution of a collection. @pvretano gives some helpful words for coverages over here https://github.com/opengeospatial/ogc_api_coverages/issues/65#issuecomment-636207840 that I would offer as an example:

A coverage is a collection of measurements each made inside of some defined subdivision (i.e. "cell") of an object space.

For EDR, I'd offer:

An EDR Resource is a collection of spatiotemporal data defined on shared axes that can be accessed using one or more sampling geometries.

Or something like that.

Are there others that are needed?

@jyutzler @jeffharrison -- are these getting us moving in a direction that would bring you closer to accepting collection as a set of spatial data resource distributions? What else would help lower the barrier to entry?

No matter what we do, we push the complexity some where so tackling it here or out at the root of the API kind of makes no difference. Since Features has decided to model it this way -- I think we should think very hard about a way to elegantly deal with the complexity in this way.

rob-metalinkage commented 4 years ago

If it quacks like a duck...

+1 to "I think we should think very hard about a way to elegantly deal with the complexity in this way."

What matters is not the definition you choose for dataset and collection, but the contract you make about how the various facets of description, subsetting, typing and behaviour interact and are linked. At one end of the spectrum make up a series of simple undocumented APIs to do a few things you know about in advance - at the other end fully describe every possible aspect in a machine readable way, including service quality, data quality, statistical rigour etc. If the answer is you need something in-between then accept it will be arbitrary - and what you need to do is not ad-hoc layering of behaviour assumptions on more interface methods but a general way to map interface methods onto a meta-model that can be extended as required. DCAT's linking of DataSet and DataDistribution is a pretty good starting point here.

i think there are enough pieces, debates and options in play that trying to visualise the current state of play as a formal model is worth the effort - maybe it exists (?) but if so why isnt it being quoted here?

modelling how the different concepts interrelate will help:

identifying ambiguities
finding the minimal. cleanest model
documenting and explaining the intended behaviour to people in future
identifying where information needs to be present to navigate between different aspects
comparing the model with other similar models (explicit or implicit)

dblodgett-usgs commented 4 years ago

@rob-metalinkage -- are you suggesting someone should model a proposed path out formally in UML? I am not aware of a model other that OpenAPI documents. Looks like there is an OpenAPI to UML plug in for Eclipse. Anyone want to give it a rip?

rob-metalinkage commented 4 years ago

personally I've moved on from UML - and would approach by trying to refine the DCAT OWL model (create a few subclasses and associations) - but realistically any Class modelling approach would probably do - even just a diagramming tool - Just being able to put the terms in context visually with the relationships they have is the thing that allows them to be better understood.

rob-metalinkage commented 4 years ago

PS - doing it RDF means that the definitions can be easily published for reuse - but UML application schema with a compilable pathway to RDF might be easier for people familiar with those environments.

dblodgett-usgs commented 4 years ago

OK -- I agree that some visualization would be helpful. Does anyone want to put their hand up to take that on?

cportele commented 4 years ago

For Features Core there is an attempt at a UML representation that was developed during the discussions last year:

https://github.com/opengeospatial/ogcapi-features/tree/master/uml

jeffharrison commented 4 years ago

Given the situation I would recommend three practical steps that could help move things forward...

1) Don't assume or mandate every resource identifier will be collections.

2) Examine existing implementations for resource patterns and templates. There are multiple OGC APIs out there for multiple resource types... so build from the 'ground up'.

3) Capture Common patterns in templated conformance classes.

(And don't forget there are multiple HTTP methods, not just GET).

Best Regards, Jeff

dblodgett-usgs commented 4 years ago

1 is done. Core does not depend on collections.

2 is exactly what we are doing here. Existing implementations that use collections have uncertainty about how to move forward because some have voiced strong disagreement with using collections for more than things that are accessed as items.

Is 3 an argument that we shouldn't be specifying literal path fragments? But rather finding common functional patterns and specifying conformance classes around those?

I don't think you really addressed the question I posed though.

@jyutzler @jeffharrison -- are these [proposed steps forward here] getting us moving in a direction that would bring you closer to accepting collection as a set of spatial data resource distributions? What else would help lower the barrier to entry?

Are you saying this is the wrong question and lowering the barrier to entry to a shared collections concept isn't what we should be trying to do? Please help focus this discussion and get us moved toward consensus.

jeffharrison commented 4 years ago

'Is 3 an argument that we shouldn't be specifying literal path fragments? But rather finding common functional patterns and specifying conformance classes around those?'

Yes.

'Are you saying this is the wrong question and lowering the barrier to entry to a shared collections concept isn't what we should be trying to do?'

Yes.

In my opinion, once we have identified common functional patterns and reusable conformance classes they may be applied against many kinds of {geospatialResource} ... including /collections if that's what the community decides.

I don't think we're actually doing 2. We should be looking at working examples in each discussion, trying to find those common functional patterns and conformance classes, in my opinion. Not forgetting that there are more HTTP methods than GET.

Best Regards, Jeff

tomkralidis commented 4 years ago

must be based on simple, use-case-oriented, building blocks that happen to fit together into a coherent (and complex) whole,

must include tight and simple definitions of things like collections that provide clarity rather than convey generality.

+1 for the simplicity. Below, a small, simple example of the building blocks in action, across a few different OGC API efforts:

dblodgett-usgs commented 4 years ago

@jeffharrison OK, so we have four specifications (Features, Coverages, Records, and EDR) that are currently using /collections. Three that use items one that does not. We are debating how to get those four to coexist around a common concept of the /collections endpoint.

What we you suggest we do that would get us to your suggested:

Examine existing implementations for resource patterns and templates.

???

tomkralidis commented 4 years ago

Hi @jeffharrison

In my opinion, once we have identified common functional patterns and reusable conformance classes they may be applied against many kinds of {geospatialResource} ... including /collections if that's what the community decides.

Could/would this mean having a core schema, so that if /collections, /coverages and so on exist, that their response pattern would be based off of some core/common schema, which they would be able to extend?

..Tom (with scars from a few too many Capabilities XML parsers over the years)

jeffharrison commented 4 years ago

First we need to fully identify all the efforts ongoing ;-)

Features Coverages Records EDR Processed Routing Tiles Styles

(I think I maybe missed one?)

Best Regards, Jeff

jeffharrison commented 4 years ago

'Processes' above

jeffharrison commented 4 years ago

Apologies... 3D API as well

Best Regards, Jeff

dblodgett-usgs commented 4 years ago

@jeffharrison -- I want to point out that your replies here are overly brief, unhelpful, and not moving us toward consensus. You have raised strong objection to the direction that three pending APIs have in their drafts and it is on you as a community member to help form a constructive path forward.

Are you saying that representatives from some APIs are not taking part in the API-Common discussion? There is a policy directive (47 here) instructing them to do so. I would appreciate your help getting them engaged if that is the case.

Can you please provide links to drafts of Routing, Tiles, Styles, and 3D API with some analysis of how they inter-relate or in some way have implications for how /collections is being used in the other specifications? I linked to the others at the top of this issue but was unable to find others.

On a separate note, can you please not abuse the comment thread? GitHub allows you to edit your comments rather than spam watchers with post-applied edits.

jeffharrison commented 4 years ago

@dblodgett-usgs

First, in my opinion your response is a bit rude. I am trying to help discussions move forward, with limited time to do so.

Second, I have tried to edit my comments and am unable to do so from this device. So I am not trying to 'spam' anyone.

Jeff

dblodgett-usgs commented 4 years ago

@jeffharrison Apologies if I'm coming off as rude -- we are all committing a lot of time and energy here and I just want to push everyone to take part in moving us forward and not just lob disagreement in without helping move forward.

Please take your time in responding on a device that will allow that -- we don't need to rush this.

jeffharrison commented 4 years ago

Hello @tomkralidis

'In my opinion, once we have identified common functional patterns and reusable conformance classes they may be applied against many kinds of {geospatialResource} ... including /collections if that's what the community decides.'

'Could/would this mean having a core schema, so that if /collections, /coverages and so on exist, that their response pattern would be based off of some core/common schema, which they would be able to extend?'

Yes, exactly.

Best Regards, Jeff

opengeospatial / ogcapi-common

Collections Discussion #140

The data-resource issue comes down to:

The items issue comes down to:

The status of addressing these issues for now:

Narrative:

A path forward:

References OGC-API specifications of interest:

Proposed solution 1:

"OGC API - Common - Part 2: Geospatial Data"

Proposed solution 2:

Proposed solution 3

Proposed Solution 4

Proposed Solution 5