zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
116 stars 10 forks source link

Understanding concerns with CF encoding of CRS #20

Open christophenoel opened 1 year ago

christophenoel commented 1 year ago

As it has been reported multiple times that the CF encoding of CRS (Coordinate Reference System) may pose issues, I propose starting a dedicated thread to gain a better understanding of these concerns.

Furthermore, I recently converted TerraSAR GeoTiff to Zarr, using rioxarray in conjunction with gdal and xarray, which resulted in the following CF grid mapping. I am somewhat surprised by the way GDAL encoded the CRS, which could be a result of the combination of gdal with xarray.

{
    "GeoTransform": "10.50014820907392 0.00012070250150375875 -2.609127035245679e-05 53.85791404256921 -1.5387410410647694e-05 -7.246156895736279e-05",
    "_ARRAY_DIMENSIONS": [],
    "crs_wkt": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\"
,0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]",
    "geographic_crs_name": "WGS 84",
    "grid_mapping_name": "latitude_longitude",
    "inverse_flattening": 298.257223563,
    "longitude_of_prime_meridian": 0.0,
    "prime_meridian_name": "Greenwich",
    "reference_ellipsoid_name": "WGS 84",
    "semi_major_axis": 6378137.0,
    "semi_minor_axis": 6356752.314245179,
    "spatial_ref": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degr
ee\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]"
}

EDIT NOTE: A key topic of this discussion concerns the need to make grid_mapping mandatory in order to exclude the possibility of having an implicit (undefined) CRS. Furthermore, grid_mapping includes several formats, and it would be wise to recommend one of them (WKT, Proj4j, ...).

dblodgett-usgs commented 1 year ago

What are you surprised by?

Most of what's in there is as would be expected per: https://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format

This page provides a fairly complete set of mappings: https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

"spatial_ref" does not appear in the CF convention but has been around in the community for a while. Someone implemented it quite a few years ago and it's been adopted in a number of places.

"_ARRAY_DIMENSIONS" is presumably being added by xarray?

christophenoel commented 1 year ago

That's not my point. I'm just trying to understand why there is a concern (discussed many times during the meeting) with this CF way of encoding CRS while GDAL/Xarray seems to generate it without problem when converting a GeoTiff to Zarr.

Why do we want to change this ?

dblodgett-usgs commented 1 year ago

The rub is the need to implement a cross walk between the CRS-WKT / EPSG representation of CRS and the grid_mapping representation of CRS. The crs_wkt is optional and all the particular grid_mapping attributes are required.

CF-NetCDF requires that you include parameters using the attribute specification from cf that duplicate what most geospatial software supports via PROJ.

I think the premise here is that we would be saying that crs_wkt is required and that the other grid_mapping attributes are encouraged to allow compatibility with software that does not support crs_wkt?

djhoese commented 1 year ago

Jumping in on this issue after this spec repository was pointed out to me. I'm the creator of the very very new geoxarray package which basically tries to take all the non-GDAL specific CRS and dimension/coordinate information functionality from rioxarray but without the GDAL/rasterio dependency. It is also meant to be more generic than rioxarray.

Anyway, I wanted to chime in with some information from my understanding having working on a similar issue in geoxarray. First, my understanding is that spatial_ref is for GDAL compatibility. Second, rioxarray and geoxarray both use the crs_wkt as a shortcut/short-circuit when reading CRS information to avoid having to convert/parse all the CF grid mapping attributes that might also exist. Both libraries depend on pyproj to do the CRS/WKT -> CF grid_mapping conversion.

There is also the case of projections that aren't supported by CF. In those cases Pyproj (and therefore rioxarray and geoxarray) will only produce the crs_wkt/spatial_ref attributes with WKT since there are no corresponding CF attributes. I believe it has been brought up on various CF discussions that it is likely a bad idea to re-invent a way of representing CRS information especially when it isn't all inclusive/supportive and the PROJ library already has a pretty good handle on it. I'm not sure what the end result of those discussions was though.

dblodgett-usgs commented 1 year ago

Great info. Thanks @djhoese -- UW Madison representing here, I love it! (I'm an '09 Civil Engineering grad)

christophenoel commented 1 year ago

I want to take this good news of a new expert partipating to summarize the question for novices like me. Please correct me if I'm wrong.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth (so not WGS 84 ellipsoid) when declaring the axisor standard_name property. CF conventions also allow to express CRS definitions via the grid_mapping attribute which refers to a separate variable.

This grid mapping variable provides the description of the mapping via a collection of attached attributes. There are several ways to describe the CRS:

Note: spatial_ref is the property commonly used by GDAL to define the CRS in WKT. I believe therefore that GDAL ignores crs_wkt in grid_mapping variable.

My conclusion

From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)

christophenoel commented 1 year ago

Another useful information reported by Roger Lott (OGC CRS SWG co-chair):

Please be aware that the OGC CRS SWG has been developing an exchange format for gridded geodetic data (GGXF). At the outset of this project we assumed that the netCDF-CF conventions would be used as a starting point for GGXF. We quickly found that they were not suitable for the rigorous geodetic requirements for CRS definitions, and have developed a separate set of GGXF Conventions.

On superficial look it does not seem that these address anything in your github issue, but if you are interested in the GGXF conventions they are in Annex B of OGC 22-051r2. r3 of this doc will be posted shortly but the changes are unlikely to be significant to you (mainly editorial following the OGC public comment review). A not-quite-complete draft of r3 is in the GGXF Github issue #59 (use the 2023-07-24 version towards the end of the issue chain).

A couple of other comments that may be more relevant to the thread: • CF netCDF conventions requires the CRS-WKT to adhere to what is sometimes called "WKT2". (See CF conventions 5.6.1 for "The crs_wkt attribute should comprise a text string that conforms to the WKT syntax as specified in reference [OGC_WKT-CRS]"). The CF conventions quote adherence to 12-063r3 (=WKT2 v1) but that has been superseded by 18-010r11 (=WKT2 v2). The difference in the two versions of WKT2 is an upgrade for dynamic CRSs (coordinates that change with time). In the the snippet that you give in the Github issue, neither crs_wkt nor spatial_ref is WKT2 of any variety. They use one of the many flavours of WKT1 (that supported by Proj/GDAL). You can tell whether the string is WKT1 or WKT2 by examination of the first keyword. If it ends in CS (e.g. GEOGCS) it is WKT1, if it ends in CRS (e.g. GEOGCRS) it is WKT2. The snippet therefore is not compliant with the CF conventions. • GGXF, like netCDF-CF, uses WKT2 for describing CRSs and transformations. • There has been criticism of WKT2 by some. My understanding is that this is due to the need to escape certain characters, in particular the degree symbol (°) and the double quote character ("), when WKT2 strings are embedded in other languages. It is my conjecture, but perhaps this may be the concern that the Github issue is trying to capture.

dwilson1988 commented 1 year ago

My conclusion

From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)

This solves a decent part of the original concern around the original issue (https://github.com/christophenoel/geozarr-spec/issues/3) just to mention it here. I just want to make sure I'm understanding a few things:

  1. If a Zarr dataset is created in a spatial reference that the CF conventions does not explicitly support, there would be no path to create a compliant GeoZarr dataset, correct?

  2. If starting from an EPSG code or WKT/ProjJSON representation of a CRS, there is no straightforward way to map it to CF if I'm understanding the table you provided

  3. Just specifying the crs_wkt attribute WOULD NOT be a compliant GeoZarr dataset

The reason I bring this up specifically is we've been outputting Zarr from our geospatial system for the last few years and all of the CRS information is specified as EPSG Code, WKT, or ProjJSON. Unless there's a way to always (and reliably!) map from CRS-WKT to CF, supporting GeoZarr would be a nonstarter for us, but we'd VERY much like to adopt GeoZarr.

Any thoughts on this?

djhoese commented 1 year ago

Speaking from a user POV and not on the "standards"/specification design side of this: I would plan/hope to use the pyproj library in all of my Python code to do conversions between PROJ.4/WKT2/EPSG/CF. Not sure that is considered "straightforward" though.

I would hope that the issue in your point 1 above could be resolved in the future by suggesting to CF to adopt crs_wkt as an alternative to the CF grid mapping attributes (either the attributes or the crs_wkt or both with priority going to crs_wkt). I don't know how open they would be to that but the fact that not every projection is supported currently would make putting the responsibility on WKT more enticing. Similarly, add the functionality to GDAL to read crs_wkt in addition to the already supported spatial_ref.

dwilson1988 commented 1 year ago

That's really the root of the rub here (correct me if I'm wrong): I can represent any CF grid mapping as WKT but not vice versa.

dblodgett-usgs commented 1 year ago

@christophenoel -- I don't think that this is quite accurate.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth

This closed issue https://github.com/cf-convention/cf-conventions/issues/410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."

dblodgett-usgs commented 1 year ago

I would be in full support of mandating grid mapping and allowing use of only a crs_wkt in cases that the CF grid mapping attributes do not support the projection of the data in question.

dwilson1988 commented 1 year ago

And to underscore @christophenoel's note above, RECOMMENDING that everyone also provide crs_wkt to ensure maximum interoperability.

christophenoel commented 1 year ago

@christophenoel -- I don't think that this is quite accurate.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth

This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."

Does it has any impact (on a user point of view) of is it only a question of definition "geographic coordinate system based on a Earth" ?

dblodgett-usgs commented 1 year ago

The impact from a user's point of view is that when given no grid_mapping specifying the shape of the earth, CF is explicitly ambiguous (there is no default shape) rather than implicit with regard to the shape of the spheroid.

christophenoel commented 1 year ago

Thank you very much for trying to explain. I think that my gaps in the geospatial field could lead to an endless discussion...

For me, if we provide latitude/longitude in degrees, I expect to be able to navigate on Earth based on the real shape of our Earth. So it works for a single point, however, I assume that you're suggesting that the gridded data is projected in any case, so we should know the exact projection (e.g. WGS84) to be able to interpret the data correctly.

djhoese commented 1 year ago

For me, if we provide latitude/longitude in degrees, I expect to be able to navigate on Earth based on the real shape of our Earth. So it works for a single point, however, I assume that you're suggesting that the gridded data is projected in any case, so we should know the exact projection (e.g. WGS84) to be able to interpret the data correctly.

I could be misunderstanding what you're saying or asking, but there is no "real shape of our Earth" when it comes to this stuff. You must have some definition/model of the Earth (a coordinate system) or else you have no guarantee that degrees (lon/lat) in your data are the same as the degrees (lon/lat) of someone else's data (different radius of the Earth, different "center" point of the coordinate system, etc). Software could always make an assumption or have some default Earth for a geographic (degrees) coordinate system, but if we're talking about accuracy then there is no guarantee unless it is explicitly defined. In my opinion nothing should be implied for new data file formats and everything should be explicit.

dblodgett-usgs commented 1 year ago

Exactly @djhoese.

@christophenoel -- every (lat/lon or projected) coordinate that references the earth must either declare (explicit) or assume (implicit) an earth spheroid model upon which the latitude / longitude coordinate system is defined. The GPS system is based around the WGS84 geodetic datum.

Note that it is common practice in some fields (weather and climate in particular) to use different earth models interchangeably. i.e. use weather stations with elliptical lat/lon as if they use a spherical earth assumption. This practice has led to tremendous confusion when using CF data.

These slides may be of interest.

christophenoel commented 1 year ago

Okay, understood. There's no notion of degrees without a model. Intuitively, when I think of degrees, it's implicitly WGS84, which assumes an ellipsoid.

So, to get back to a your statement, I get that CF doesn't guarantee that WGS84 is presupposed.

dwilson1988 commented 1 year ago

Just wanted to circle back on this topic - what's the timeline look like for discussing/implementing some of these changes around CRS? Is there a general consensus about supporting crs_wkt in GeoZarr?

christophenoel commented 1 year ago

@dwilson1988 : When the SWG is approved by the OGC, a specification document (following OGC standards) will be created with the necessary structure, allowing anyone interested to collaborate on topics of their choice. However, nothing prevents someone from starting work on the tickets or in the repository

rouault commented 10 months ago

I haven't followed what GeoZarr has decided regarding CRS encoding, but FYI GDAL 3.9 per https://github.com/OSGeo/gdal/pull/9108 will be able to infer CRS in a Zarr dataset using a CF-1 grid_mapping variable (basically raw conversion of netCDF CF-1 to Zarr)

christophenoel commented 9 months ago

This is good news, and I assume that the library supports all the extensions such as: Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

The meetings were interrupted for several months to create the OGC SWG, and they will resume this week.