Subset requirements class definition incomplete and does not match its examples

(Please let me know if you'd like this sort of feedback to be split up in the future. I'm trying to avoid polluting the tracker with tons of related feedback.)

Inconsistencies:

The BNF specifies that intervals are colon-separated. Examples are comma-separated. It's not clear to me which is correct, but I prefer colon pretty strongly based on it being easier to parse and understand when accepting unexploded parameters and using a syntax potentially familiar to Python users (though comma is conventional for mathematical intervals). Note also that both comma and colon cause encodeURIComponent to return an encoded value, potentially impacting readability and ability for end-users constructing URLs to generate correct values.
The definition uses lowercase subset= and the examples use uppercase SUBSET=. Even if case-insensitive, consistency is preferable.
Intervals are defined differently to how datetime defines them, both in how they separate components and their notation for open and closed intervals. FWIW, I prefer the subset notation because it doesn't tend to trip
Examples use inconsistent capitalization of axes. I can't tell if this is intentional. Any case sensitivity in axis names should be noted. If it's case sensitive, I'd suggest capitalizing Time for consistency and to suggest case sensitivity. If it's not, I'd suggest lowercasing everything for consistency with the rest of the API.
bbox and datetime naming and syntax are not simply carried forward for spatial and temporal subsetting, causing clients to, in very common cases, specify the same information two different ways. "Give me the coverages matching this bounding box, then subset their latitude and longitude using these two ranges" vs "Give me the coverages matching this bounding box, then give me only the points within this bounding box."

Incomplete items:

The meaning of asterisk (*) is not explained in this section. I assume it has similar meaning to open / closed interval in datetime
The document does not specify whether intervals are inclusive or exclusive. The examples seem to suggest they're inclusive, which would make sense, but it should be spelled out.

Clashing Syntax / Interpretation:

Mathematical intervals use ( to denote exclusive intervals and [ to denote inclusive intervals, where subset= (I think) describe inclusive intervals and use (
C-style function calls. lat(0, 10) triggers "function call" in developer brains, especially if comma is correct (unclear)
Mathematical points. (0, 10) looks a lot more like a point than an interval, if the comma is correct notation
The OpenAPI doc includes rangeSubset which seems to be some sort of variable subsetting. If that's intended to be added to the spec, its meaning in relation to this is very confusing without referencing the docs

I'm willing to fix the obviously necessary items above (BNF and examples using different syntax and caps, documenting asterisk) if someone can clarify/confirm the correct interpretation.

Alternative syntax options for consideration, so I'm not just throwing stones:

subset=lat[0:10] Advantages: Uses widely recognized array syntax, along with a common slicing syntax. Does not trigger any mathematical interpretation, but if you wanted to look at it as an interval, it's clearly inclusive.
subset[lat]=0,10 Advantages: Consistent with bounding box syntax. May be easier for some languages to parse into appropriate structures.
bbox=0,0,10,10 Advantages: Can literally carry forward a well-known query parameter that's available in core for querying collections without having to (a) detect which axes correspond to which spatial dimension or (b) provide the alternative syntax subset=lat(0:10)&subset=lon(0:10). Note, I'd propose this as an additional convenience syntax, as I can se the utility in also being able to subset arrays not expressible by bbox or datetime

@bilts: thanks a lot for this scrutiny! a bunch of issues indeed, I am giving them a try one by one.

(Please let me know if you'd like this sort of feedback to be split up in the future. I'm trying to avoid polluting the tracker with tons of related feedback.)

Inconsistencies:

* The BNF specifies that intervals are colon-separated.  Examples are comma-separated.  It's not clear to me which is correct, but I prefer colon pretty strongly based on it being easier to parse and understand when accepting unexploded parameters and using a syntax potentially familiar to Python users (though comma is conventional for mathematical intervals).  Note also that both comma and colon cause `encodeURIComponent` to return an encoded value, potentially impacting readability and ability for end-users constructing URLs to generate correct values.

good catch! I somewhat favour colons as well, OTOH WCS uses commas - let us decide about the preferred way.

and yes, we frequently hit the limits of https - I could not find a way how to circumvent this while at the same time keeping intuitive syntax.

* The definition uses lowercase `subset=` and the examples use uppercase `SUBSET=`.  Even if case-insensitive, consistency is preferable.

such a convention has been discussed in yesterday's telecon, based on your comment, and resolution is: lower case. we will adjust all text accordingly. An incentive to keep names short ;-)

* Intervals are defined differently to how `datetime` defines them, both in how they separate components and their notation for open and closed intervals.  FWIW, I prefer the subset notation because it doesn't tend to trip

yes, unfortunately feature folks have not been approachable on harmonization here. WFS & children is inconsistent across space and time whereas coverages have a tradition of having a unified syntax.

we only have closed intervals, BTW.

Hm, can you please explain "tend to trip" ?

* Examples use inconsistent capitalization of axes.  I can't tell if this is intentional.  Any case sensitivity in axis names should be noted.  If it's case sensitive, I'd suggest capitalizing `Time` for consistency and to suggest case sensitivity.  If it's not, I'd suggest lowercasing everything for consistency with the rest of the API.

see above, thanks to your contribution this has been clarified now.

* `bbox` and `datetime` naming and syntax are not simply carried forward for spatial and temporal subsetting, causing clients to, in very common cases, specify the same information two different ways.  "Give me the coverages matching this bounding box, then subset their latitude and longitude using these two ranges" vs "Give me the coverages matching this bounding box, then give me only the points within this bounding box."

these are the same, actually - there is no natural preference of one axis over the other in selection, they are independent and can be done in any sequence or, at least conceptually, even in parallel.

Again, this is the legacy of WFS which has added time late in the process whereas coverages have been designed multidimensional by nature. Same with vertical axis, BTW.

Incomplete items:

* The meaning of asterisk (`*`) is not explained in this section.  I assume it has similar meaning to open / closed interval in `datetime`

no, it means "take the current boundary, even though I may not know the actual value". So ":" would mean "gimme all".

Indeed, this needs (more) explanation as we go forward.

* The document does not specify whether intervals are inclusive or exclusive.  The examples seem to suggest they're inclusive, which would make sense, but it should be spelled out.

inclusive by nature. Again, you are right: needs to be stated.

Clashing Syntax / Interpretation:

* Mathematical intervals use `(` to denote exclusive intervals and `[` to denote inclusive intervals, where `subset=` (I think) describe inclusive intervals and use `(`

that tried to be minimize deviation from existing syntax way back when WCS was adopted, to decrease friction if you will. I am fine with brackets!

* C-style function calls.  `lat(0, 10)` triggers "function call" in developer brains, especially if comma is correct (unclear)

* Mathematical points.  `(0, 10)` looks a lot more like a point than an interval, if the comma is correct notation

* The OpenAPI doc includes `rangeSubset` which seems to be some sort of variable subsetting.  If that's intended to be added to the spec, its meaning in relation to this is very confusing without referencing the docs

stay tuned, description of range subsetting still to come!

I'm willing to fix the obviously necessary items above (BNF and examples using different syntax and caps, documenting asterisk) if someone can clarify/confirm the correct interpretation.

thank you, appreciated, but I got this task already assigned in the last telecon.

Alternative syntax options for consideration, so I'm not just throwing stones:

* `subset=lat[0:10]` Advantages: Uses widely recognized array syntax, along with a common slicing syntax.  Does not trigger any mathematical interpretation, but if you wanted to look at it as an interval, it's clearly inclusive.

yes, looks good to me.

* `subset[lat]=0,10` Advantages: Consistent with bounding box syntax.  May be easier for some languages to parse into appropriate structures.

while I perfectly understand and like the idea of seeing subset as an array that seems counter-http could be difficult to implement given the parsing methods of http libraries, and changes the paradigm to have semantics in the left-hand side.

* `bbox=0,0,10,10` Advantages: Can literally carry forward a well-known query parameter that's available in core for querying collections without having to (a) detect which axes correspond to which spatial dimension or (b) provide the alternative syntax `subset=lat(0:10)&subset=lon(0:10)`.  Note, I'd propose this as an additional convenience syntax, as I can se the utility in also being able to subset arrays not expressible by `bbox` or `datetime`

indeed, disadvantage: axis order not visible, limited to 2D, etc. But as feature folks dominate OAPI this has been adopted on coverages, too. Even more, the subset technique has been put into a separate conformance class so that implementations are not obliged to support it. Not that I am happy with this.

I see the argument on bbox, but there are some features that are then missing from the coverages API. Rather than being prescriptive about a solution, I'll note some common use cases I don't think are quite doable yet.

A general purpose, visual client (imagine QGIS) is showing a user a map. It wants to fetch a coverage corresponding to the area of interest with minimal additional input or client-side hacks. This does not seem possible, because a conformant client cannot infer enough information to know, say, what the name of the axis corresponding to longitude is (lon? long? x? x_axis? longitude? some alternative capitalization?). Perhaps not all coverages have longitude, but for the many, many that do, the spec and its extensions are not powerful enough to allow such a general-purpose visual client to exist.
A user is harmonizing data and wants to retrieve a coverage within a given area of interest in a desired CRS to allow it to be used with some other data with minimal data retrieval and preprocessing. The spec does not provide a way to request a range-based subset of a coverage in relation to the CRS being requested. Consider a polar projection of geospatial data with a non-polar native CRS. It is likely that the original data has a horizontal axis corresponding to longitude, with name, for example "lon." In the projected coordinate system, the horizontal axis is no longer longitude. I don't see a way to define that the server wants to call that axis "x" anywhere in the spec, so you can't subset in terms of the CRS you are requesting, which is arguably the most natural way to subset.
A user wants to find coverages that have data corresponding to given axis ranges and then subset to only fetch data within those ranges, with minimal code and inference. While usually possible to perform this operation in the current spec, it requires two completely different representations of "axis range" (bbox at the /collections level, subset at the rangeset level) with potentially two different meanings. In this case, a user would also still need to know axis ordering and undesirable things about bbox. It's a worst-of-both-worlds scenario.

I think bbox, as is, can easily accomplish 1 and 2 (as evidenced by its utility in prior W*S versions). It's not the only way, though. The third item just calls for some consistency / reconciliation in the spec.

have fixed now:

SUBSET -> subset
explanation of "*"

@bilts, thank you for the detailed use cases, this helps tremendously in discussing! Giving it a try:

1. A general purpose, visual client (imagine QGIS) is showing a user a map.  It wants to fetch a coverage corresponding to the area of interest with minimal additional input or client-side hacks.  This does not seem possible, because a conformant client cannot infer enough information to know, say, what the name of the axis corresponding to longitude is (lon? long? x? x_axis? longitude? some alternative capitalization?).  Perhaps not all coverages have longitude, but for the many, many that do, the spec and its extensions are not powerful enough to allow such a general-purpose visual client to exist.

A coverage as per CIS has the axis labels listed explicitly (and in proper sequence), such as "Lat Lon day". It could be inferred from the CRS (by resolving the URL), but for efficiency reasons axis names are included. Same for uom. Hence, a client can indeed know axis names and extents, and can do subsetting. Actually, coverages services do that all the time. Example: http://inspire.rasdaman.org/rasdaman/apps/wcs-console/index.html - click on any coverage name.

2. A user is harmonizing data and wants to retrieve a coverage within a given area of interest in a desired CRS to allow it to be used with some other data with minimal data retrieval and preprocessing.  The spec does not provide a way to request a range-based subset of a coverage in relation to the CRS being requested.  Consider a polar projection of geospatial data with a non-polar native CRS.  It is likely that the original data has a horizontal axis corresponding to longitude, with name, for example "lon."  In the projected coordinate system, the horizontal axis is no longer longitude.  I don't see a way to define that the server wants to call that axis "x" anywhere in the spec, so you can't subset in terms of the CRS you are requesting, which is arguably the most natural way to subset.

range subsetting is another story, that goes into a dedicated extension. The CRS extension offers a subsetcrs parameter where the input coordinate bbox can be specified in its own CRS. Without that (ie, per Core) the coverage's Native CRS, as stored in the server, will be assumed for subsetting coordinates.

3. A user wants to find coverages that have data corresponding to given axis ranges and then subset to only fetch data within those ranges, with minimal code and inference.  While usually possible to perform this operation in the current spec, it requires two completely different representations of "axis range" (`bbox` at the `/collections` level, `subset` at the `rangeset` level) with potentially two different meanings.  In this case, a user would also _still_ need to know axis ordering and undesirable things about `bbox`.  It's a worst-of-both-worlds scenario.

no, you do not need to know axis order in subset (as opposed to bbox where the implicit order is a first-class source of errors). You can write subset=Lat(...)&subset=Lon(...) just as well as subset=Lon(...)&subset=Lat(...) So with subset the user has a maximum of convenience.

If you want to know axis order: it is given by the srsLabels attribute in the coverage. The subset mechanism uses it (whereas for bbox this is frozen in the spec = less general AFAICS).

I think bbox, as is, can easily accomplish 1 and 2 (as evidenced by its utility in prior W*S versions). It's not the only way, though. The third item just calls for some consistency / reconciliation in the spec.

Hope to have clarified how subset is safe, simple, and exactly defined.

cheers, Peter

@bilts Could you please review this issue in light of the latest SWG decisions and changes / fixes to the subsetting conformance class? (Please consider pull request https://github.com/opengeospatial/ogc_api_coverages/pull/98 as well). Thank you!

Coverages SWG call: Please note that PR #98 has just been merged.

Coverages SWG call: Closing now, please reopen if anything is still missing.

opengeospatial / ogcapi-coverages

Subset requirements class definition incomplete and does not match its examples #58