opengeospatial / ogcapi-coverages

OGC API - Coverages draft specification
https://ogcapi.ogc.org/coverages
Apache License 2.0
22 stars 13 forks source link

Remarks around latest draft #185

Open m-mohr opened 5 days ago

m-mohr commented 5 days ago

I'm reading through v0.0.7 of Coverages for the first time. Here are the questions that came up while reading it:

  1. 5.6 - What's the difference between http://www.opengis.net/def/rel/ogc/1.0/geodata and http://www.opengis.net/def/rel/ogc/1.0/data?
  2. 7.2.4 - How should the Content-Datetime: value be encoded in case of an interval? How can I encode temporal information that are not valid according to RFC3339? (e.g. 2020-01 until 2021-01)?
  3. 7.2.2 - "Any additional temporal dimension SHALL include in a trs property a URI or safe CURIE" This is somewhat confusing. "Any additional temporal dimension SHALL include a trs property with a URI or safe CURIE ... as a value." or so.
  4. 7.2.2 - "the dimension SHALL include a definition property corresonding to a URI for the observed or measured property." How do I get such a URI?
  5. 7.2.2 - What are the allowed values for unitLang? Can I use UDUNITS2? If not predefined, this will not be very interoperable. What is UCUM? (I googled it, but it wouldn't hurt to add a link.)
  6. 7.2.2 - "The name (JSON dictionary key) of each additional dimension SHALL correspond to the axis abbreviation for the CRS of that axis" That is very abstract to me. Additional to what? Does this only apply to spatial dimensions? What would this be for a dimension that describes air pressure for example?
  7. 7.2.2 grid property: How would that look like? It's not clear whether this would be valid for example: grid: {cellsCount: 123, resolution: "1m"} (regular) or grid: {coordinates: ["B1", "B2", "B5"]} (irregular)?
  8. 7.2.2 - It's not quite clear for which purpose two CRS properties are needed.
  9. 7.2.3 3C - Are the title, type and description top-level? {type: ...} Or are they part of an object schema? {type: object, properties: {type: ...}}
  10. 7.2.3 3D - It's not quite clear what the sequential order (x-ogc-property-seq) should express.
  11. 7.2.4 Rec. 1 - Do I understand correctly that the c-OGC-limits.coverage and related properties should go into the OpenAPI document for the .../coverage endoint? It's not quite clear where they should go.
  12. 8 - It looks like subset=Lat(40:50),Lon(10:20) and bbox=... parameters do the same thing. Why does the API specify two ways for doing the same thing? Same seems to apply for time.
  13. 8.3.3 - If I have two temporal dimensions, does the datetime parameter apply to both? What happens if at least one of the dimensions doesn't have RFC3339 compliant instances?
  14. 8.3.4 - Lat/Lon have uppercase first letters. Does this imply that I can't name my spatial dimensions lat/lon or x/y? Same for time. Can I have a single temporal dimension called t?
  15. 8.3.4. Rec. 3: That's a lot of aliases. Wouldn't it be simpler if the server exposes specific names and only those are allowed for subsetting?
  16. 9.3. - It seems axisName is defined in 8, but not in 9. If you didn't care for 8 you may not know the definitions for axisName. Do all the aliases of 8 also apply here?
  17. 103.1 - Where do I get the fieldIndex from? Searching the document for fieldIndex doesn't find anything outside of the EBNF.
  18. 11.2.1 - It says I shall support URI/CURIE, but it doesn't say what happens if I provide e.g. WKT1/2, EPSG Code or similar, PROJJSON (as string), etc. Generally, this sounds like an interoperability nightmare, also with Permission 6 in mind.
  19. 13 - Could the Scenes API just define relation types as used in Records or STAC and not define any new endpoints? This way you could just point to existing OAR and STAC endpoints and make the whole req. class much simpler.
  20. 13 - Do I understand correctly that the scenes endpoint should return responses compliant either to OAR or STAC?

I didn't read chapter 12, because Coverage Tiles etc. were not relevant for me at this point.

jerstlouis commented 5 days ago

Thanks so much for the deep dive @m-mohr .

I tried go through all these points with the necessary amount of detailed explanation. We should see whether we need to split some of these into separate issues, add comments to existing issues and what needs to be fixed/clarified/improved in the spec.

  1. 5.6 - What's the difference between http://www.opengis.net/def/rel/ogc/1.0/geodata and http://www.opengis.net/def/rel/ogc/1.0/data?

Granted that this distinction might not be super clear, but these link relations are inherited from Features and Tiles, respectively. [ogc-rel:data] is for the landing page to point to the list of collections, whereas [ogc-rel:geodata] is for pointing to the relevant collection in the link's context. I think the only place the latter is used for OGC API - Coverages is for linking to the collection from the tileset metadata of a coverage tileset at /collections/{collectionId}/tiles/{tilesetId}, inside the list of layers making up the tileset (which is more relevant for vector tiles where you typically have multiple layers, and the tileset is at /tiles/{tilesetId} rather than inside a single collection).

  1. 7.2.4 - How should the Content-Datetime: value be encoded in case of an interval? How can I encode temporal information that are not valid according to RFC3339? (e.g. 2020-01 until 2021-01)?

Thanks for spotting that, I guess I did not realize that RFC 3339 does not seem to cover time period at all. The intent was to follow the same syntax as the datetime parameter in Features (and Common) which cites RFC 3339 section 5.6. The intent is to use the / separator. For your particular example, couldn't you encode that as 2020-01-01 / 2020-12-31 ?

  1. 7.2.2 - "Any additional temporal dimension SHALL include in a trs property a URI or safe CURIE" This is somewhat confusing. "Any additional temporal dimension SHALL include a trs property with a URI or safe CURIE ... as a value." or so.

Will fix that, thanks. I'm also wondering if we should make that a SHOULD rather than a SHALL, since now we have the semantic definition and unit. Not sure whether these could be more appropriate than a trs/crs in some cases.

  1. 7.2.2 - "the dimension SHALL include a definition property corresonding to a URI for the observed or measured property." How do I get such a URI?

They could technically come from anywhere, but I've found QUDT most helpful as an ontology so far: https://www.qudt.org/pages/QUDToverviewPage.html e.g., the QuantityKind

  1. 7.2.2 - What are the allowed values for unitLang? Can I use UDUNITS2? If not predefined, this will not be very interoperable. What is UCUM? (I googled it, but it wouldn't hurt to add a link.)

They are not restricted, but so far we've done examples with UCUM and QUDT. We're trying to align all this (/collections/{collectionId}/schema) with Features - Part 5: Schemas as well as possible a requirement class in Common as well. I agree it would help to link to UCUM. We'll try to synchronize / reference better with Features and Common.

  1. 7.2.2 - "The name (JSON dictionary key) of each additional dimension SHALL correspond to the axis abbreviation for the CRS of that axis" That is very abstract to me. Additional to what? Does this only apply to spatial dimensions? What would this be for a dimension that describes air pressure for example?

We'll clarify the language, but this is the concept of Uniform Additional Dimensions defined in a requirement class of Common. They are the dimensions other than the primary temporal dimension and the 2 or 3 spatial dimensions. Yes, air pressure is the typical example.

  1. 7.2.2 grid property: How would that look like? It's not clear whether this would be valid for example: grid: {cellsCount: 123, resolution: "1m"} (regular) or grid: {coordinates: ["B1", "B2", "B5"]} (irregular)?

I have some example here:

2D elevation: https://maps.gnosis.earth/ogcapi/collections/SRTM_ViewFinderPanorama?f=json

2D + time + pressure: https://maps.gnosis.earth/ogcapi/collections/climate:cmip5:byPressureLevel:windSpeed?f=json

The unit should not be inside the resolution property, but implied from the dimension CRS for the spatial dimension, the trs for the primary temporal dimension, and the unit for other dimensions. Note that for spatial, the grid property is an array of objects -- one for each of the 2 or 3 spatial dimensions.

It would be OK to have the bands on a dimension (considering them an axis of the domain rather than different fields of the range) however, it makes more sense for hyperspectral data. With the typical LANDSAT or sentinel-2 datasets, we implement them as fields of the range. Our visualization client tools right now would struggle a lot with the bands as a dimension, not able to do much with it. It would be interesting to ask others in the group about how they can handle the data organized one way or the other.

  1. 7.2.2 - It's not quite clear for which purpose two CRS properties are needed.

Are you talking about storageCrs vs. crs?

The storageCrs (inherited from Features - Part 2: CRS) denotes the native CRS of the data i.e., the most efficient CRS in which to request the data and/or the one with the least potential for re-projection issues when requesting that same CRS as the output CRS (crs parameter). It's also the CRS in which the storageCrsBbox property of the spatial extent and the grid of the spatial extent is specified.

The crs property (an array) is the list of CRS supported as output CRS for the crs parameter.

  1. 7.2.3 3C - Are the title, type and description top-level? {type: ...} Or are they part of an object schema? {type: object, properties: {type: ...}}

Yes they are top-level. The /schema is a JSON Schema resource, corresponding to Features - Part 5: Schema and the relevant requirement class of Common as well. We probably need to improve this section in Coverages, either bringing in more detailed content or more clearly referencing the Common and/or Features section pointing out only the differences.

  1. 7.2.3 3D - It's not quite clear what the sequential order (x-ogc-property-seq) should express.

For coverage fields, this would be the order in which these fields come by default (when not using properties= to re-order them or omit some of them) in a format where order matters (e.g., GeoTIFF bands). There's also a related discussion to clarify that the fields by default might be a subset of those which can be obtained when explicitly retrieving all of them (our sentinel-2 datacube for example returns B04,B03,B02 by default which conveniently matches to a natural color GeoTIFF, but you can request other bands with e.g., properties=B01,B02,B03,B04,B08).

  1. 7.2.4 Rec. 1 - Do I understand correctly that the x-OGC-limits.coverage and related properties should go into the OpenAPI document for the .../coverage endoint? It's not quite clear where they should go.

The x-OGC-limits.coverage go into the service metadata as defined in Common - Part 1 -- see example B.4:

https://docs.ogc.org/is/19-072/19-072.html#_56682cbf-76dc-4c75-a266-a58186d638aa

From the API landing page, there's a link relation service-meta to service metadata. This may or may not be to the same end-point at /api, but in either case would include an info section as described in that example.

  1. 8 - It looks like subset=Lat(40:50),Lon(10:20) and bbox=... parameters do the same thing. Why does the API specify two ways for doing the same thing? Same seems to apply for time.

Correct. bbox and datetime are basically a convenience around the more powerful and explicit subset parameter (which is the only one that supports additional dimensions). These convenience parameters are also more familiar to users of WMS and OGC API - Features. This is particularly convenient if you want to tweak request URLs to return /map vs. /items and /map vs. /coverage for the same collection.

Considering that a lot more clients are being developed than servers, and that OGC APIs are intended for users to play with directly in their browser, requiring both option from the server introduces a very small burden for server implementors (who can simply map both syntaxes to their implementation), while providing the flexibility for clients and users to pick their favorite options.

  1. 8.3.3 - If I have two temporal dimensions, does the datetime parameter apply to both? What happens if at least one of the dimensions doesn't have RFC3339 compliant instances?

No, the datetime applies to only the temporal dimensions defined in temporal of the collection extent. The other dimensions can use another trs,crs or unit as defined in the additional dimension of the extent, and can be subset with the subset parameter using those units as described.

  1. 8.3.4 - Lat/Lon have uppercase first letters. Does this imply that I can't name my spatial dimensions lat/lon or x/y? Same for time. Can I have a single temporal dimension called t?
  1. 8.3.4. Rec. 3: That's a lot of aliases. Wouldn't it be simpler if the server exposes specific names and only those are allowed for subsetting?

This entire situation is very thorny and there are unfortunately no simple solution. "Simpler" is somewhat relative depending on whose perspective we consider the situation. From the perspective of the client, a simple solution is accessing any OGC API - Coverages deployment, using the same dimension abbreviation and things just working. First, the spatial and temporal dimensions from the extent (initially inherited from Features) do not name dimensions at all. In the Coverage Implementation Schema, there were also several related difficulties in resolving the dimensions (compound CRSes issues, abbreviation issues... quite a nightmare). It is particularly difficult to find out the abbreviated names of a dimension for a particular CRS, and they changed in the past. What may appear slightly complicated is actually what I honestly believe to be the simplest solution and most consistent after working on this and hitting my head against all kind of related issues for the last five years.

To summarize, there is a fixed list of axis names for spatial and temporal dimension that all deployments need to support + a list of recommended equivalent axis names to ease the pain of those who may be suffering from axis confusion (whether from a fault of their own or their client's).

This is consistent across OGC API - Coverages, OGC API - Maps, and OGC API - DGGS and makes the extent description consistent with OGC API - Features and Common as well, so that you can access the same collection of data from one or more OGC API data access mechanisms.

  1. 9.3. - It seems axisName is defined in 8, but not in 9. If you didn't care for 8 you may not know the definitions for axisName. Do all the aliases of 8 also apply here?

Yes the {axisName} are exactly the same for both scaling and subsetting. The recommended aliases to be supported also apply here (but they remain a recommendation that clients cannot rely on). We should make the section self-sufficient as you point out.

  1. 10.3.1 - Where do I get the fieldIndex from? Searching the document for fieldIndex doesn't find anything outside of the EBNF.

We could probably clarify that but that refers to the x-OGC-property-seq in 14 C. We are also considering whether we should get rid of that option to specify a fieldIndex altogether as a way to explicitly pick or order a band.

  1. 11.2.1 - It says I shall support URI/CURIE, but it doesn't say what happens if I provide e.g. WKT1/2, EPSG Code or similar, PROJJSON (as string), etc. Generally, this sounds like an interoperability nightmare, also with Permission 6 in mind.

With this requirement class, providing a WKT1/2 or PROJJSON is not an option. Just like Features - Part 2: CRS this is strictly a "by reference" option, with the available CRS options listed in the collection crs property of the collection. You can pass an EPSG code as e.g., crs=[EPSG:3395].

The Permission 6 is to allow for servers to be compatible with the WxS syntax e.g., crs=EPSG:3395.

A future or vendor extension or some CRUD extension might allow a client to provide an arbitrary CRS defined in WKT or PROJJSON or the future PROJ CRS derivative, but this is way beyond the scope of Coverages - Part 1.

  1. 13 - Could the Scenes API just define relation types as used in Records or STAC and not define any new endpoints? This way you could just point to existing OAR and STAC endpoints and make the whole req. class much simpler.

There are multiple use cases for the Scenes requirement classes, which all fit nicely together:

You could implement both the Scenes requirement class at /collections/{coverageId}/scenes AND a STAC API at /collections/{stacCatalogID}/items. I personally really do not like the latter pattern, and I would never implement it, because it mixes up data and metadata in a way which I find truly awful.

The Scene requirement class re-uses the same query parameters as Records (many of which shared with the STAC API) and works together with the STAC item / collection media type for the list of scenes at /collections/{coverageId}/scenes and individual scenes at /collections/{coverageId}/scenes/{sceneId}, but it's nicely nested inside a particular coverage/collection/geodatacube, avoiding any of that data / metadata mixup.

  1. 13 - Do I understand correctly that the scenes endpoint should return responses compliant either to OAR or STAC?

The OGC API - Coverages "Scenes" requirement class integrates with STAC and Records in two ways:

m-mohr commented 5 days ago

I won't be able to answer in detail today, but thanks for all the details. I think most of that should be clarified in the spec in the end. But to avoid that I forget that detail:

For your particular example, couldn't you encode that as 2020-01-01 / 2020-12-31 ?

Also not allowed in RFC3339, so I'd need to provide 2020-01-01T00:00:00Z / 2020-12-31T23:59:59Z, I guess.

jerstlouis commented 5 days ago

I think most of that should be clarified in the spec in the end.

Agreed -- if it was not clear in the first read, there's room for improvement. PRs welcome ;)

A big part of the challenges is trying to keep things consistent across the data access APIs and with OGC API - Common, though I believe that we are really almost there now.

Also not allowed in RFC3339

Thanks for clarifying that. I was always a bit confused about that because section 5.6 defines a grammar but does not very clearly indicate that only date-time is a valid top-level rule entry point for that grammar in that section.

I would tend to agree that this syntax with time is quite cumbersome when dealing with daily, monthly or yearly time series, and the server should at least be allowed to interpret shorter client requests, so that users can easily tweak parameters directly in the browser.

Our server does support shorter syntaxes, and our client may also be sending out these shorter requests in some cases. We should probably discuss / clarify this. ISO8601 is quite complicated, so RFC 3339 is nice as a simpler subset, but it might be a bit too strict in this aspect.

strobpr commented 15 hours ago

I think most of that should be clarified in the spec in the end.

Agreed -- if it was not clear in the first read, there's room for improvement. PRs welcome ;)

A big part of the challenges is trying to keep things consistent across the data access APIs and with OGC API - Common, though I believe that we are really almost there now.

Also not allowed in RFC3339

Thanks for clarifying that. I was always a bit confused about that because section 5.6 defines a grammar but does not very clearly indicate that only date-time is a valid top-level rule entry point for that grammar in that section.

I would tend to agree that this syntax with time is quite cumbersome when dealing with daily, monthly or yearly time series, and the server should at least be allowed to interpret shorter client requests, so that users can easily tweak parameters directly in the browser.

Our server does support shorter syntaxes, and our client may also be sending out these shorter requests in some cases. We should probably discuss / clarify this. ISO8601 is quite complicated, so RFC 3339 is nice as a simpler subset, but it might be a bit too strict in this aspect.

If someone could help me in understanding this: image

The problem I see is that it is neglected that all these 'time' expressions are intervals. A millisecond is still an interval! If we define that a time interval is used to represent only its starting point, then a full calender year (in this case 2021) is to be denoted as 2021-01-01/2022-01-01 or simpler 2021/2022 of course it would be even simpler if we would just agree to see time related terms as what they really are: intervals! Then 2021 is all you'd need to state!

strobpr commented 15 hours ago
  1. 5.6 - What's the difference between http://www.opengis.net/def/rel/ogc/1.0/geodata and http://www.opengis.net/def/rel/ogc/1.0/data?

Maybe that helps: https://isotc211.geolexica.org/concepts/202/ https://isotc211.geolexica.org/concepts/104/

strobpr commented 14 hours ago

I won't be able to answer in detail today, but thanks for all the details. I think most of that should be clarified in the spec in the end. But to avoid that I forget that detail:

For your particular example, couldn't you encode that as 2020-01-01 / 2020-12-31 ?

Also not allowed in RFC3339, so I'd need to provide 2020-01-01T00:00:00Z / 2020-12-31T23:59:59Z, I guess.

I'd say that is one second short of the full year! In this case it should read: 2020-01-01T00:00:00Z / 2021-01-01T00:00:00Z

jerstlouis commented 14 hours ago

@strobpr I agree that 2020-01-01T00:00:00Z / 2021-01-01T00:00:00Z makes more sense.

More obvious if we write it as 2020-01-01T00:00:00.000Z / 2021-01-01T00:00:00.000Z.

Maybe we should clarify this.

strobpr commented 12 hours ago

@strobpr I agree that 2020-01-01T00:00:00Z / 2021-01-01T00:00:00Z makes more sense.

More obvious if we write it as 2020-01-01T00:00:00.000Z / 2021-01-01T00:00:00.000Z.

Maybe we should clarify this.

I believe we should. Honestly I don't think it makes much sense to treat the expressions we use for time as 'instants', in the meaning of 'points in time'. We should acknowledge that all (non-mathematical) ways to address time are not dimensionless and therefore to be understood as tesselations of the time dimension. Then we can agree to directly use them as intervals and eventually build a useful hierarchical order in different granularity levels where all intervals have an appropriate index that uniquely describes their duration and place in the continuum. I did something like that soem 20 years ago and it looked like that: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Level | Interval Width | Coding | Example | Code Range | Refinement -- | -- | -- | -- | -- | -- 0 | Century | YY | 20 | -99 to 99 | 2 1 | half Century (50 Years) | YY-q | 20-A | A,B | 2 2 | quarter Century (25 Years) | YY-qq | 20-AA | AA,AB,BA,BB | 2,5   |   |   |   |   |   3 | Decade | YYY | 200 | 0 to 9 | 2 4 | half Decade (5 Years) | YYY-q | 200-A | A,B | 2 5 | quarter Decade (2.5 Years) | YYY-q | 200-AB | AA,AB,BA,BB | 2,5   |   |   |   |   |   6 | Year | YYYY | 2007 | 0 to 9 | 2 7 | half Year (6 Months) | YYYY-q | 2007-B | A,B | 2 8 | quarter Year (3 Months) | YYYY-qq | 2007-BA | AA,AB,BA,BB | 3   |   |   |   |   |   9 | Month | YYYYMM | 200706 | 01 to 12 | 2 10 | half Month | YYYYMM-q | 200706-A | A,B | 1,5   |   |   |   |   |   11 | ten Day | YYYYDdd | 2007D16 | 00 to 35 | 1,4 12 | Week | YYYYWww | 2007W24 | 00 to 52 | 2,3 13 | three Day | YYYYTttt | 2007T24 | 00 to 121 | 3 14 | Day | YYYYMMDD | 20070616 | 00 to 364 | 2 15 | half Day (12 Hours) | YYYYMMDD-q | 20070616-A | A,B | 2 16 | quarter Day (6 Hours) | YYYYMMDD-qq | 20070616-AB | AA,AB,BA,BB | 2 17 | eigth Day (3 Hours) | YYYYMMDD-qqq | 20070616-ABB | AAA,AAB,ABA, … | 2 18 | sixteenth Day (1.5 Hours) | YYYYMMDD-qqqq | 20070616-ABBA | AAAA,AAAB,… | 1,5   |   |   |   |   |   19 | Hour | YYYYMMDDhh | 2007061611 | 00 to 23 | 2 20 | half Hour (30 Minutes) | YYYYMMDDhh-q | 2007061611-A | A,B | 2 21 | quarter Hour (15 Minutes) | YYYYMMDDhh-qq | 2007061611-AB | AA,AB,BA,BB | 1,5   |   |   |   |   |   22 | ten Minutes | YYYYMMDDhhm | 20070616115 | 0 to 5 | 2 23 | five Minutes | YYYYMMDDhhm-q | 20070616115-A | A,B | 2 24 | two and a half Minutes | YYYYMMDDhhm-qq | 20070616115-AB | AA,AB,BA,BB | 2,5 25 | Minute | YYYYMMDDhhmm | 200706161154 | 00 to 59 | 2 26 | half Minute (30 Seconds) | YYYYMMDDhhmm-q | 200706161154-A | A,B | 2 27 | quarter Minute (15 Seconds) | YYYYMMDDhhmm-qq | 200706161154-AB | AA,AB,BA,BB | 1,5   |   |   |   |   |   28 | ten Seconds | YYYYMMDDhhmms | 2007061611543 | 0 to 5 | 2 29 | five Seconds | YYYYMMDDhhmms-q | 2007061611543-B | A,B | 2 30 | two and a half Seconds | YYYYMMDDhhmms-qq | 2007061611543-BA | AA,AB,BA,BB | 2,5 31 | Second | YYYYMMDDhhmmss | 20070616115436 | 00 to 59 |  

Unfortunately no one in my surrounding found it very useful, so it didn't see much application beyond my own stuff.

chris-little commented 11 hours ago

@strobpr The workplan (that is an exaggerated term) for the Temporal DWG is to register a Temporal CRS that could address most of the above examples: T-10, T-9, T-8, ... T-1, Zero! T+1, ... "Count Down" or "Count Up",. It is an indexed grid TCRS. Exactly like imagery, indication of centre-point or edge-point (the "cell-method") is also needed. Then you do not have to follow an ISO8601 like syntax. Meteorologists have been labelling their forecasts like this for decades (HH+72, HH+84, HH+96, ...)