opengeospatial / ogcapi-environmental-data-retrieval

A Web API that provides a family of lightweight interfaces for accessing Environmental Data resources.
https://ogcapi.ogc.org/edr
54 stars 25 forks source link

parameter-name always required? #229

Closed ShaneMill1 closed 3 years ago

ShaneMill1 commented 3 years ago

According to the spec, it looks like parameter-name is always required.

Thinking this through, suppose a collection has many many parameters. If a user wants all of them, they would have to specify all of the parameters.

If parameter-name was optional, and a user does not specify parameter-name, it could return all of the parameters and keep the API request simple. However, the downfall could be that a user accidentally requests everything, without necessarily wanting everything, thus causing unnecessary burden on the servers.

An option for parameter-name remaining required, but an implementer keeping the query simple for all parameters would be: &parameter-name=all

This may be an implementation detail but curious if other's have thoughts, and maybe this could contribute to a best practices in the future.

CC: @m-burgoyne @tomkralidis @solson-nws @chris-little

tomkralidis commented 3 years ago

I see a similar pattern for the z parameter (z=ALL). We should choose a convention and apply it across similar query parameters.

iandruska-ibl commented 3 years ago

However, the downfall could be that a user accidentally requests everything, without necessarily wanting everything, thus causing unnecessary burden on the servers.

A server could impose a limit on the size of the query (or response). When the limit is exceeded, server would respond with 413 (Request Entity Too Large) accompanied with an error message explaining the limit.

m-burgoyne commented 3 years ago

I would prefer an explicit request of parameter-name=ALL, rather than defaulting to returning all of the data if parameter-name is not specified.

solson-nws commented 3 years ago

All:

I'd be happy with explicitly requesting parameter-name=ALL

On Mon, Mar 1, 2021 at 4:03 AM m-burgoyne notifications@github.com wrote:

I would prefer an explicit request of parameter-name=ALL, rather than defaulting to returning all of the data if parameter-name is not specified.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ogcapi-environmental-data-retrieval/issues/229#issuecomment-787782825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYCGUK2FKI3LVN775R5JXLTBNJ6RANCNFSM4YIVZ3LQ .

iandruska-ibl commented 3 years ago

Would this <url_parameter>=ALL syntax apply to z and datetime as well or to parameter-name only?

m-burgoyne commented 3 years ago

@iandruska-ibl As @tomkralidis suggested I think it would be best that the convention applied to all similar query parameters

iandruska-ibl commented 3 years ago

If I understand correctly, this change would make z, datetime and parameter-name mandatory.

What about future extensions/additions? For example, in our implementation we added realisation parameter. It allows selecting a specific member (or members) from an ensemble. The convention pushes us to make the realisation parameter mandatory as well. But that would break our compliance with the EDR standard.

I will try to explain by example. Let's suppose an ensemble NWP dataset and a collection EPS_isobaric exposing data from the dataset. The following query returns all members of the ensemble for given parameter-name, z and datetime:

/edr/collections/EPS_isobaric?parameter-name=geopotential&z=850&datetime=2021-03-02T00:00:00Z

Optionally, a client can issue a non-compliant query to select specific members using the realisation parameter:

/edr/collections/EPS_isobaric?parameter-name=geopotential&z=850&datetime=2021-03-02T00:00:00Z&realisation=1,2,3,4

However, the new convention pushes us to make the realisation parameter mandatory. To get the same result, the client would now have to write:

/edr/collections/EPS_isobaric?parameter-name=geopotential&z=850&datetime=2021-03-02T00:00:00Z&realisation=ALL

Now the only option the client have is a non-compliant query. Of course, we could drop the convention in case of the realisation parameter but we would lose the API consistency doing that.

I believe that the new convention limits the API extensibility (or at least makes it harder). Not just for the "unofficial" extensions but also the official ones in the future.

chris-little commented 3 years ago

@iandruska-ibl Having the same policy for parameter-name, z and datetime, =ALL with the API controlling volume of response with a 413 seems sensible. I have no problem with there being exceptions, though it makes implementation more tedious. Retrieving from Ensemble Collections, other than by using Instances, have been identified as future work.

iandruska-ibl commented 3 years ago

@chris-little My concern is not specifically about ensembles. My concern is about openness of the API for future extensions in general. Ensembles served just as an example. I apologize for not being more lucid about that.

Anyway, my proposal is to keep the status quo on z and datetime, i.e. to keep <url_parameter>=ALL implicit. And to do the same for parameter-name as @ShaneMill1 originally suggested. And to control the volume of response with a 413. Most production implementations (especially for public use) will have to implement some sort of output limiting anyway no matter whether the <url_parameter>=ALL is implicit or explicit.

Most importantly, this approach keeps the API more open for future extensions, I believe.

m-burgoyne commented 3 years ago

@iandruska-ibl My concern is this approach leaves a server vulnerable to badly formed requests, for some types of data the volume of data that will be generated may be unknown until some processing occurs. Enforcing a policy of making the user actively specify that they require all data values would both help reduce unnecessary processing on the server and excess data in the clients,

cportele commented 3 years ago

So far, OGC API GET requests have followed a pattern that no query parameter is mandatory and that the server simply uses a sensible default and declares that for each query parameter. This makes it easier for newcomers to an API (in particular to those that have never heard about EDR or OGC) to get started quickly with successful requests.

If "all" is not a good default why not leave it to the API provider to decide what a good default is (this could be just the most commonly requested parameter)?

iandruska-ibl commented 3 years ago

@m-burgoyne server without an output limiting mechanism is vulnerable no matter whether the =ALL is implicit or explicit. Nothing prevents a client from issuing an exhausting query explicitly. Explicit =ALL does not improve security too much in my opinion.

Implementing a generic output limiting mechanism is fairly trivial:

if count(z) * count(datetime) * count(parameter-name) > LIMIT:
    raise HTTPError(413, "Query limit exceeded. The limit works like this: ...")

Where count(<url_parameter>) is the number of data coordinates implied by the value of <url_parameter>.

m-burgoyne commented 3 years ago

Although I still prefer the idea of a simple check to see if a query parameter has been specified. z and datetime are optional parameters, so there is a good argument to change the approach from z=all to don't specify z if you require all height levels to be returned. Then making parameter-name optional and returning all of the available data parameters if the parameter-name query parameter is not specified would be consistent.

tomkralidis commented 3 years ago

PR in #241

chris-little commented 3 years ago

close when PR #251 merged

chris-little commented 3 years ago

PR251 merged. EDR API SWG 47 agreed to close issue