collection links at collections endpoint

pomakis commented 3 years ago

"OGC API - Common - Part 2" is ambiguous about how complete the representation of the collection objects in the collections array at the collections endpoint should be. The three alternatives are:

a complete representation of the collection, including all links

This allows a client to have a complete description of each collection merely by fetching the collections endpoint (which could be filtered based on a client-supplied filter, and could be broken up into multiple pages). This is certainly bulkier than the other alternatives, but since most servers will have at most a few dozen (or perhaps a few hundred) collections, this isn't a big deal. And for the extreme cases, the paging mechanism insures that the sizes of individual responses don't get unwieldy.

This is what the CubeWerx OGC API server at https://test.cubewerx.com/cubewerx/cubeserv/demo/ogcapi/Daraa currently does.

a minimal representation of the collection whose only required links are "self" and perhaps "alternate"

This is the approach tentatively settled on, for example, at the tilesets endpoint of "OGC API - Tiles". The idea is that the client would slurp in the array to know what collections are available, and then make a further fetch of the specific collections of interest to get their complete descriptions.

The problem with this approach is that many client applications begin by determining (for presentation purposes or otherwise) the set of available collections to the user, along with the capabilities of each collection. For example, it would typically like to know, for each collection, whether it has vector features, whether it's available as a coverage, whether a schema is available for it, whether it can serve maps or tiles, and what styles are available for it, etc. These things are all communicated by its set of links. If the collections endpoint only provides a minimal set of links, the client would need to fetch the collections endpoint (which could be broken up into multiple pages), and then make a separate request for each and every collection (of which there may be hundreds) to collect its complete information.

something squishy in-between

Allow the server to chose, at its discretion, how much of a subset of the collection links to communicate at the collections endpoint. This is what the current wording of "OGC API - Common - Part 2" seems to allow. In my opinion this is the worst possible alternative, since the client applications will never be able to trust the completeness of the information communicated at the collections endpoint, and to be safe will be forced to always make individual collection requests anyways.

A variation of this alternative is to let the individual specifications dictate what collection links must and shouldn't be communicated by the collections endpoint. But this really seems like unnecessary complication that client and server implementations are always going to trip up on.

For example, the collections endpoint of the OGC API server at https://t17.ldproxy.net/fns reports the items links for each collection, but not the schema-item, styles or tilesets-vector links, etc. I suspect it does this because Requirement 25 of "OGC API - Features - Part 1" requires the collections endpoint to list all "items" links, but is silent on the whether or not the other links are required.

To me minimal or complete are the only reasonable approaches. And I have a strong preference for complete so that the client doesn't have to fetch the description of each and every collection separately.

cmheazel commented 3 years ago

@pomakis The problem with including the complete representation is that the response can become very large. True you can filter which collections to include, but only if the server implements the simple query conformance class. What we can do is move the requirements for the collection resource into a separate requirement module. That module would define the required and optional properties of the collection resource. Both rc-md-items and src-md-success would include conformance with this requirements module. This would provide a single definition for the resource (including mandatory properties) while leaving the implementer the freedom to do what is best for their users.

pomakis commented 3 years ago

I'm not sure I understand your way forward on this. I just want to make sure that in the end, questions like the following have a clear answer: If I wanted to write a client application which, say, presents a list of mappable collections and allows the user to select one or more of them to display in a map, would this application be able to determine which collections are mappable from the collections endpoint alone, or would it have to make a separate HTTP request for the definition of each collection? There need to be clear rules about this.

cmheazel commented 3 years ago

@pomakis My previous proposal would guarantee that one set of rules govern collection resources regardless of whether they are accessed through /collections or through /collections/{collectionId}. Properties that are mandatory for /collections/{collectionid} would also be mandatory for /collections. The second part of your question goes to what the minimal content of the collections resource should be. Common Part 2 must be able to support any resource collection. The set of common mandatory properties that we have been able to identify is already captured in the draft. It's very small. Resource-specific API standards can expand that list, and make optional properties mandatory. But we have done about all we can on this topic within the scope of Common.

pomakis commented 3 years ago

But we have done about all we can on this topic within the scope of Common.

I disagree. As a specific example, look at the definitions of the "notam" collection at these two endpoints:

https://t17.ldproxy.net/fns/collections?f=json https://t17.ldproxy.net/fns/collections/notam?f=json

The former only provides a subset of the links that the latter does. Is this okay? If so, what rules are in place to let the client know what links can safely be determined from the collections endpoint versus having to drill down further? "OGC API - Common - Part 2" is ambiguous about this, but shouldn't be. There should be a clear rule or client-server interaction will be guesswork.

cportele commented 3 years ago

The former only provides a subset of the links that the latter does. Is this okay?

Of course. It conforms to the requirements of the standards that are implemented by the API.

If so, what rules are in place to let the client know what links can safely be determined from the collections endpoint versus having to drill down further?

The requirements which information - including the links - has to be included for each collection in the Collections resource are stated by the standards that have requirements for the Collections / Collection resources. This should be the standards that define data resources, Common documents the minimal framework.

pomakis commented 3 years ago

So you're going with "something squishy in-between".

This is my last rebuttal, I promise. I just want to make sure my concern is perfectly clear.

In my opinion, allowing the individual specifications to distinguish between what links go in the representation of a collection at the collections endpoint versus what links go in the representation of a collection at the collections/{collectionId} endpoint is a recipe for disaster. It adds a considerable amount of complexity:

complexity in the specifications

Rather than a specification (e.g., "OGC API - Maps" or "OGC API - Coverages", etc.) simply indicating what additional links it's adding to a collection, it must distinguish, for each added link, whether or not it must be included in the representation of a collection at the collections endpoint. The successful coordination of this type of minutiae between all of the different specification is likely to fail.

complexity in the server implementations

A server implementation will need to have two separate JSON-representation generators for a collection object, or at least a complex generator which takes a boolean indicator of which endpoint it's being generated for. It would need to incorporate a hardcoded list of which link relation types are meant to be included in the representation of a collection at the collections endpoint versus which ones are not. And this list will need to be consistent with what's dictated separately by each of the individual specifications.

complexity in the client implementations

A client implementation will also need to be aware of which collection links it can safely rely on the collections endpoint for and which ones it will need to drill down further to the individual collections/{collectionId} endpoints for. For a client to be robust, if there's any ambiguity in the wording of the specification, it'll have no choice but to assume that it can't trust what's in the representation at the collections endpoint, and will drill down just in case. And in the common situation where the client application needs to determine the list of the collections that are available on the server and what they're capable of (e.g., maps, tiles, coverages, features, styles, etc.) so that it can provide the appropriate GUI options to the user, it would have little choice but to drill down separately with an HTTP request to each and every one of the individual collections/{collectionId} endpoints, of which there may be hundreds.

And all of this is for what? To save maybe 15% in the size of the response of the collections endpoint?

This can all be avoided simply by stating, in "OGC API - Common - Part 2", that the representation of a collection at the collections endpoint must be identical to its representation at the collections/{collectionId} endpoint. That one sentence resolves all of this.

(A less favoured alternative is to go in exactly the other direction and state that the only reliable link in the representation of a collection at the collections endpoint is the "self" link to the complete representation. It wouldn't resolve the extra gruntwork that the client application would have to do, but at least it would simplify the specifications and the server implementations.)

jerstlouis commented 3 years ago

Coverages has related open issue 109

I agree that there should be Common guidance about this, because it affects all specifications which would need guidance for this (e.g. Tiles).

cmheazel commented 3 years ago

Requirement /req/collections/src-md-success states that "The content of that response SHALL be consistent with the content for this collection in the /collections response. That is, the values for id, title, description and extent SHALL be identical."

This implies that they are two representations of the same resource. A rather soft requirement I admit, but it is something that we can build on.

I think we should toss this over the wall to one of the code sprints. I see two questions: 1) What is the minimum required Collection metadata needed to select a Collection (/collections) 2) What is the minimum required Collection metadata needed to exploit (use) a Collection. (/collections/{collectionid})

pvretano commented 3 years ago

@cmheazel actually I think the question is how deep to you want a client to have to dive before they have gathered enough information to be able to interact with a collection offered by a server. As things stand right now the client needs to read /collections and then for the collections of interest, dive into /collections/{collectionId}. For some clients, especially ones like catalogues that extra hop can be very costly since they have to do that for every single collection that server offers.

rob-metalinkage commented 3 years ago

I'm not going to push for something I dont need to implement myself, but sharing my experiences with various different scales of systems I think it needs to be pointed at that simplicity in one place can increase complexity in another, and we really should be looking from a systems viewpoint not a component viewpoint.

Acknowledging that humans are part of a system, and getting content published at all is a big driver in overall capability...

The actors that need to be considered are aggregators and humans - it is an anti-pattern to have a large and growing aggregation of poorly documented data - such systems have been present for the last 25 years on the web and have failed to evolve into a satisfactory solution.

This is because we also need to consider

complexity in the client interpretation

complexity in the client discovery of data

and recognise that this complexity in incurred every time data is accessed, as opposed to once-off when servers or clients are implemented.

"Something scaleable in-between" is thus a profitable trade-off. Look at DNS resolution, or Maven and modular Java libraries - they scale although they not that easy to set up.

I.e. the downside of enforcing a "one-size-fits-all" aggregation model is that it cannot scale because it cannot be partially cached and its stability will be difficult in a network-scale aggregation mode, and it will always compromise on descriptivity of resources. It guarantees a long list of poorly described resources with no possibility of machine mediation or efficient caching.

The other extreme is compromised by the combinatorial explosion of different levels of detail for resources that will be inevitable, and the burden on the final user to determine what is compatible with their target use (i.e. with the target domain of interoperability).

So the optimal solution is something "in the middle" which defines specific, simple-to-use, statements about compatibility with a domain of interoperability, allows cacheable descriptions of these domains, and is extensible to per-resource arbitrary details means that a simple aggregation, filterable by domain, meets the basic requirements of efficient overview, but also allows efficient and flexible access to both standardisable and customisable machine readable descriptions for both domain (analogy Java interfaces) and resources (analogy object documentation). If you can make this simple then job done.

For comparison - this is equivalent to the dcterms:conformsTo in DCAT for data cataloguing - with some thought about what an API needs to say about conformance. Not exactly a radical pattern ;-) - but acknowledging DCAT hasn't fully addressed API vs data interoperability yet.

jerstlouis commented 2 years ago

Discussing today we are saying that Common - Part 2 requires a minimum at /collections, only an id and links for specific relation types (self).

Additional standards can be more specific as what needs to be included at both /collections and /collections/{collectionId}.

Common - Part 2 is more flexible than features which requires all links to be the same at both levels.

Requirement 8C will be updated.

cmheazel commented 2 years ago

Update document so that only self link is required in /collections response. Required elements are ID, links.

cmheazel commented 2 years ago

This issue will be closed once the necessary updates are made.

cmheazel commented 2 years ago

A review of the document shows that only the self and alternate links are required. The alternate link is only required if the self resource is also available in another media type. So no change to the document appears to be needed.

opengeospatial / ogcapi-common