Open clynnes opened 4 years ago
allow servers to advertise maximum request limit
How would servers be able to evaluate whether a request was too big?
For coverages that are regular grids, the response volume can often be computed from the bounding box size and the time period requested. For servers in the cloud, a response may also be too big for cost reasons, so the next step would be to estimate egress and processing charges. For multi-point coverages, this may be more difficult.
API-Common Part 2 will allow a server to place a limit on the size of a response. If a response will exceed that limit, then the response is "paged", returning a partial response and the metadata necessary to retrieve the rest. With some adjustments, this existing capability can be applied to coverages.
@cmheazel @clynnes I suggest that this is split into two issues: response too big, and asynchronous response. See async text in the EDR API
@chris-little So far this discussion has not touched on async. However, I have captured the async issue under API-Common issue 231.
SWG 2021-05-26: Agreed that service-metadata is the best place to describe API-wide limits in terms of a maximum number of cells and/or uncompressed megabytes of data that the server will accept to return. Furthermore we need a permission that the server can reject a client request (some standardized 400 code?) if the client is asking for too much data.
413 Payload Too Large ~ https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.11
@nmtoken As we discussed in the call, I believe 413 as defined might applies to the payload provided by the client, as opposed to the payload being returned with the response, though that should be clarified.
@jerstlouis Sorry, wasn't on the call and missed the discussion.
Now you say it, I agree, it could mean that too or either may apply. My next pick would be 400 Bad Request ~ https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.1
he 400 (Bad Request) status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).
As a related aside to this, I would like to suggest that we consider that for servers supporting the scaling conformance classes, that a default /coverage
request could provide downsampled data if scaleFactor=1
is not explicitly provided. This makes it easy for client to get an overview of the whole coverage regardless of its dimensions or resolution, and prevents the default request from returning an error.
e.g.
https://maps.ecere.com/ogcapi/collections/SRTM_ViewFinderPanorama/coverage?scaleFactor=1 returns an error, but https://maps.ecere.com/ogcapi/collections/SRTM_ViewFinderPanorama/coverage does not.
not a good idea to change the behavior of Core in a way that is not intuitive and not consistent. And makes it more complicated to understand. Why not focus energy on a good set of examples, which would be extremely beneficial...
@pebau while I agree on the examples, that is a separate topic.
The reason I make this suggestion here is that the link to /coverage
is provided from the collection resource, and for many coverages it is a broken link that is asking for too much data if not modified with additional parameters specified.
Principle 23 of the OGC API Web Guidelines mentions:
Good practice for defining default behavior is avoiding exception, error or empty pages as the response.
So I really like the idea of being able to return the downsampled coverage by default if the full resolution is too much for the server to process or the client expecting to receive, while saying scaleFactor=1
would mean the client knows what it is doing and insists on retrieving the full resolution data.
I also believe it is part of the solution to the problem raised in this issue by @clynnes . Also to address the original suggestions, all of these are good capabilities as well:
page through the coverage response by spatial or temporal chunk or tile
The specification now has conformance classes for subset
and coverage tiles.
obtain estimated size of a response before actually requesting the data (like a HEAD request)
We should probably add this as a new requirement?
allow servers to advertise maximum request limit
We were just discussing adding this to service metadata at the API level.
have a specific error response that indicates the request is too big to satisfy
We were just discussing 400 is likely the proper error response for this.
support asynchronous response
Discussed in #66 -- multiple OGC API specifications are going to use the Prefer header to indicate that an async response is desired, and this could be defined in Common.
@jerstlouis you commonly "like the idea" - let me note, for the records, that others do not necessarily like it (in this case: at least me). In my world, more technically based arguments tend to be used. In the case on hand, concerns get mixed: consistent service logic and resource-dependent limitations such as quota.
@pebau I included the reason why I like the idea: to avoid a default behavior returning an error or returning more data than the client (or webcrawler) intended to request. I understand your opinion and concerns about the service consistency, I think it's a question of balancing perceived consistent service logic vs. the implications of having a default behavior which is normally not what is intended by users/clients.
The webcrawler case is probably the best reason to go for this -- this was much less of a concern with WCS as the GetCoverage request link was not exposed in the same web-friendly way as it is with OGC API - Coverages. (and returning an error to indexing crawlers negatively affects SEO and thus Findability).
@jerstlouis I do not share the idea that users would intend what is sketched (let aside that a complete specification is still missing), also with webcrawlers. If this is a use case, how is that solved by other tools and services, in the geo and other domains? Any investigation available?
Admittedly we haven't actually had any real clients yet for our OGC API - Coverages implementation, but I can tell you that of all the people within our company that I got to test drive what's been implemented so far, they've all complained that our /coverage endpoint is broken because they get a "400 Bad Request" response. So in this respect it seems to me that having endpoints that actually work without having to append parameters to them would be a good thing.
If a client wants to explicitly request the full-scale coverage, it always can do so by adding scaleFactor=1
. It doesn't even have to check to see if the server actually supports the scaling conformance class, since servers that don't support this conformance class would presumably just ignore this parameter (or throw an error if its value is something other than 1).
I don't feel strongly about this one way or another, and am happy to let the industry decide, but I just figured I'd share my own personal experience and opinion in this matter.
SWG 2021-11-25: We agreed to include permissions and requirements along these lines:
Permission: A server advertising conformance for the scaling conformance class MAY return a downsampled version of the coverage, if the client does not explicitly request scale-factor=1
Requirement: Even if the scaling conformance class is not supported, the server SHALL understand requests specifying scale-factor=1
as indicating that the client wishes to retrieve the data at full resolution.
If a client wants to page and the server supports the Coverage Tiles conformance class, that can be used and the client can negotiate e.g. its preferred TileMatrixSet to page/request in the most convenient way.
just for the records, I am strongly against this.
2022-01-12: Suggestion from @tomkralidis to have a boolean property in the metadata indicating that the default /coverage
request will return downsampled data.
Motion to update the spec to include permissions to return downsampled data, introduce this boolean property, and requirement to accept scale-factor=1
even when scaling conformance class is not supported, and to allow advertising limits as per #110.
Moved: Jerome
Second: Stephan
Discussion:
Noting Peter's objection.
3 votes in favor in this meeting.
@tomkralidis
Question regarding your suggestion
to have a boolean property in the metadata indicating that the default /coverage request will return downsampled data
There is a permission now in place.
In the case where a response to a request without a scale-factor, scale-axes or scale-size parameters would be larger than an advertised server limit, an implementation MAY automatically downsample the coverage to a suitable resolution instead of returning a 4xx client error.
and a Note:
A client retrieving a coverage from an implementation advertising support for this "Scaling" requirements class should explicitly use scale-factor=1 if it wants to ensure retrieving the coverage in its native resolution and prefers receiving an error instead of a downsampled version of the data.
I am wondering now how to implement this suggestion, and having seconds thoughts about whether it is really necessary / useful. The only place we could include this metadata (e.g., "defaultCoverageMayBeDownsampled" : true
) would be in /collections/{collectionId}
.
My feeling, as per the note, is that any client requesting /coverage
needs to explicitly use ?scale-factor=1
as query parameter if it absolutely does not want downsampled data. Then these clients should not care what this property says, and should just assume this to be the case for all implementations.
Adding more things to the collection description complicates things.
Adding this could also have an opposite effect to the intent of preventing downsampled data used inadvertently.
It could create a situation where clients think they can rely on looking at defaultCoverageMayBeDownsampled
and if it is not there or false
, they do not need to use ?scale-factor=1
to ensure downsampled data, but a server implementation could then fail this requirement. It actually makes things more complicated to implement for both client and server to look for this property and branch one way or the other, and introduces more potential points of failure.
In OGC API - Maps we already have this expectation that /map
is not necessarily the most detailed resolution.
In Features there is a paging mechanism, so an expectation that /items
will not necessarily return all items.
Thoughts about whether it is necessary or not to include something like "defaultCoverageMayBeDownsampled" : true
, or we may be better off without it?
@jerstlouis so to summarize:
scale-factor=1
if they so chooseThat works for me, we should have some informative text on this behaviour (probably a similar in OAMaps while we're at it).
For large datasets, such as satellite products covering long periods of time, it is easy to submit a synchronous coverage request that cannot be fulfilled within typical timeout limits. Even if a large request CAN be fulfilled, the user may be impacted by getting much more data volume than expected. There is likely no one solution to solve this for the diversity of users and servers. An assemblage of solutions might include the following capabilities: