Closed ShaneMill1 closed 3 years ago
I see a similar pattern for the z
parameter (z=ALL
). We should choose a convention and apply it across similar query parameters.
However, the downfall could be that a user accidentally requests everything, without necessarily wanting everything, thus causing unnecessary burden on the servers.
A server could impose a limit on the size of the query (or response). When the limit is exceeded, server would respond with 413 (Request Entity Too Large) accompanied with an error message explaining the limit.
I would prefer an explicit request of parameter-name=ALL
, rather than defaulting to returning all of the data if parameter-name
is not specified.
All:
I'd be happy with explicitly requesting parameter-name=ALL
On Mon, Mar 1, 2021 at 4:03 AM m-burgoyne notifications@github.com wrote:
I would prefer an explicit request of parameter-name=ALL, rather than defaulting to returning all of the data if parameter-name is not specified.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/229#issuecomment-787782825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYCGUK2FKI3LVN775R5JXLTBNJ6RANCNFSM4YIVZ3LQ .
Would this <url_parameter>=ALL
syntax apply to z
and datetime
as well or to parameter-name
only?
@iandruska-ibl As @tomkralidis suggested I think it would be best that the convention applied to all similar query parameters
If I understand correctly, this change would make z
, datetime
and parameter-name
mandatory.
What about future extensions/additions? For example, in our implementation we added realisation
parameter. It allows selecting a specific member (or members) from an ensemble. The convention pushes us to make the realisation
parameter mandatory as well. But that would break our compliance with the EDR standard.
I will try to explain by example. Let's suppose an ensemble NWP dataset and a collection EPS_isobaric
exposing data from the dataset. The following query returns all members of the ensemble for given parameter-name
, z
and datetime
:
/edr/collections/EPS_isobaric?parameter-name=geopotential&z=850&datetime=2021-03-02T00:00:00Z
Optionally, a client can issue a non-compliant query to select specific members using the realisation
parameter:
/edr/collections/EPS_isobaric?parameter-name=geopotential&z=850&datetime=2021-03-02T00:00:00Z&realisation=1,2,3,4
However, the new convention pushes us to make the realisation
parameter mandatory. To get the same result, the client would now have to write:
/edr/collections/EPS_isobaric?parameter-name=geopotential&z=850&datetime=2021-03-02T00:00:00Z&realisation=ALL
Now the only option the client have is a non-compliant query. Of course, we could drop the convention in case of the realisation
parameter but we would lose the API consistency doing that.
I believe that the new convention limits the API extensibility (or at least makes it harder). Not just for the "unofficial" extensions but also the official ones in the future.
@iandruska-ibl Having the same policy for parameter-name
, z
and datetime
, =ALL
with the API controlling volume of response with a 413
seems sensible.
I have no problem with there being exceptions, though it makes implementation more tedious.
Retrieving from Ensemble Collections, other than by using Instances, have been identified as future work.
@chris-little My concern is not specifically about ensembles. My concern is about openness of the API for future extensions in general. Ensembles served just as an example. I apologize for not being more lucid about that.
Anyway, my proposal is to keep the status quo on z
and datetime
, i.e. to keep <url_parameter>=ALL
implicit. And to do the same for parameter-name
as @ShaneMill1 originally suggested. And to control the volume of response with a 413. Most production implementations (especially for public use) will have to implement some sort of output limiting anyway no matter whether the <url_parameter>=ALL
is implicit or explicit.
Most importantly, this approach keeps the API more open for future extensions, I believe.
@iandruska-ibl My concern is this approach leaves a server vulnerable to badly formed requests, for some types of data the volume of data that will be generated may be unknown until some processing occurs. Enforcing a policy of making the user actively specify that they require all data values would both help reduce unnecessary processing on the server and excess data in the clients,
So far, OGC API GET requests have followed a pattern that no query parameter is mandatory and that the server simply uses a sensible default and declares that for each query parameter. This makes it easier for newcomers to an API (in particular to those that have never heard about EDR or OGC) to get started quickly with successful requests.
If "all" is not a good default why not leave it to the API provider to decide what a good default is (this could be just the most commonly requested parameter)?
@m-burgoyne server without an output limiting mechanism is vulnerable no matter whether the =ALL
is implicit or explicit. Nothing prevents a client from issuing an exhausting query explicitly. Explicit =ALL
does not improve security too much in my opinion.
Implementing a generic output limiting mechanism is fairly trivial:
if count(z) * count(datetime) * count(parameter-name) > LIMIT:
raise HTTPError(413, "Query limit exceeded. The limit works like this: ...")
Where count(<url_parameter>)
is the number of data coordinates implied by the value of <url_parameter>
.
Although I still prefer the idea of a simple check to see if a query parameter has been specified. z
and datetime
are optional parameters, so there is a good argument to change the approach from z=all
to don't specify z
if you require all height levels to be returned. Then making parameter-name
optional and returning all of the available data parameters if the parameter-name
query parameter is not specified would be consistent.
PR in #241
close when PR #251 merged
PR251 merged. EDR API SWG 47 agreed to close issue
According to the spec, it looks like parameter-name is always required.
Thinking this through, suppose a collection has many many parameters. If a user wants all of them, they would have to specify all of the parameters.
If parameter-name was optional, and a user does not specify parameter-name, it could return all of the parameters and keep the API request simple. However, the downfall could be that a user accidentally requests everything, without necessarily wanting everything, thus causing unnecessary burden on the servers.
An option for parameter-name remaining required, but an implementer keeping the query simple for all parameters would be: ¶meter-name=all
This may be an implementation detail but curious if other's have thoughts, and maybe this could contribute to a best practices in the future.
CC: @m-burgoyne @tomkralidis @solson-nws @chris-little