Open christophenoel opened 1 year ago
What are you surprised by?
Most of what's in there is as would be expected per: https://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format
This page provides a fairly complete set of mappings: https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements
"spatial_ref" does not appear in the CF convention but has been around in the community for a while. Someone implemented it quite a few years ago and it's been adopted in a number of places.
"_ARRAY_DIMENSIONS" is presumably being added by xarray?
That's not my point. I'm just trying to understand why there is a concern (discussed many times during the meeting) with this CF way of encoding CRS while GDAL/Xarray seems to generate it without problem when converting a GeoTiff to Zarr.
Why do we want to change this ?
The rub is the need to implement a cross walk between the CRS-WKT / EPSG representation of CRS and the grid_mapping representation of CRS. The crs_wkt
is optional and all the particular grid_mapping
attributes are required.
CF-NetCDF requires that you include parameters using the attribute specification from cf that duplicate what most geospatial software supports via PROJ.
I think the premise here is that we would be saying that crs_wkt
is required and that the other grid_mapping
attributes are encouraged to allow compatibility with software that does not support crs_wkt
?
Jumping in on this issue after this spec repository was pointed out to me. I'm the creator of the very very new geoxarray package which basically tries to take all the non-GDAL specific CRS and dimension/coordinate information functionality from rioxarray but without the GDAL/rasterio dependency. It is also meant to be more generic than rioxarray.
Anyway, I wanted to chime in with some information from my understanding having working on a similar issue in geoxarray. First, my understanding is that spatial_ref
is for GDAL compatibility. Second, rioxarray and geoxarray both use the crs_wkt
as a shortcut/short-circuit when reading CRS information to avoid having to convert/parse all the CF grid mapping attributes that might also exist. Both libraries depend on pyproj to do the CRS/WKT -> CF grid_mapping conversion.
There is also the case of projections that aren't supported by CF. In those cases Pyproj (and therefore rioxarray and geoxarray) will only produce the crs_wkt/spatial_ref
attributes with WKT since there are no corresponding CF attributes. I believe it has been brought up on various CF discussions that it is likely a bad idea to re-invent a way of representing CRS information especially when it isn't all inclusive/supportive and the PROJ library already has a pretty good handle on it. I'm not sure what the end result of those discussions was though.
Great info. Thanks @djhoese -- UW Madison representing here, I love it! (I'm an '09 Civil Engineering grad)
I want to take this good news of a new expert partipating to summarize the question for novices like me. Please correct me if I'm wrong.
CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth (so not WGS 84 ellipsoid) when declaring the axis
or standard_name
property. CF conventions also allow to express CRS definitions via the grid_mapping
attribute which refers to a separate variable.
This grid mapping
variable provides the description of the mapping via a collection of attached attributes. There are several ways to describe the CRS:
lambert_conformal_conic
, mercator
, etc.) based on the associated properties.crs_wkt
also to express multiple CRS properties in WKT (used by GDAL/OGR) in the property crs_wkt
Note: spatial_ref
is the property commonly used by GDAL to define the CRS in WKT. I believe therefore that GDAL ignores crs_wkt
in grid_mapping variable.
My conclusion
From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping
attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt
to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)
Another useful information reported by Roger Lott (OGC CRS SWG co-chair):
Please be aware that the OGC CRS SWG has been developing an exchange format for gridded geodetic data (GGXF). At the outset of this project we assumed that the netCDF-CF conventions would be used as a starting point for GGXF. We quickly found that they were not suitable for the rigorous geodetic requirements for CRS definitions, and have developed a separate set of GGXF Conventions.
On superficial look it does not seem that these address anything in your github issue, but if you are interested in the GGXF conventions they are in Annex B of OGC 22-051r2. r3 of this doc will be posted shortly but the changes are unlikely to be significant to you (mainly editorial following the OGC public comment review). A not-quite-complete draft of r3 is in the GGXF Github issue #59 (use the 2023-07-24 version towards the end of the issue chain).
A couple of other comments that may be more relevant to the thread: • CF netCDF conventions requires the CRS-WKT to adhere to what is sometimes called "WKT2". (See CF conventions 5.6.1 for "The crs_wkt attribute should comprise a text string that conforms to the WKT syntax as specified in reference [OGC_WKT-CRS]"). The CF conventions quote adherence to 12-063r3 (=WKT2 v1) but that has been superseded by 18-010r11 (=WKT2 v2). The difference in the two versions of WKT2 is an upgrade for dynamic CRSs (coordinates that change with time). In the the snippet that you give in the Github issue, neither crs_wkt nor spatial_ref is WKT2 of any variety. They use one of the many flavours of WKT1 (that supported by Proj/GDAL). You can tell whether the string is WKT1 or WKT2 by examination of the first keyword. If it ends in CS (e.g. GEOGCS) it is WKT1, if it ends in CRS (e.g. GEOGCRS) it is WKT2. The snippet therefore is not compliant with the CF conventions. • GGXF, like netCDF-CF, uses WKT2 for describing CRSs and transformations. • There has been criticism of WKT2 by some. My understanding is that this is due to the need to escape certain characters, in particular the degree symbol (°) and the double quote character ("), when WKT2 strings are embedded in other languages. It is my conjecture, but perhaps this may be the concern that the Github issue is trying to capture.
My conclusion
From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the
grid_mapping
attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouragingcrs_wkt
to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)
This solves a decent part of the original concern around the original issue (https://github.com/christophenoel/geozarr-spec/issues/3) just to mention it here. I just want to make sure I'm understanding a few things:
If a Zarr dataset is created in a spatial reference that the CF conventions does not explicitly support, there would be no path to create a compliant GeoZarr dataset, correct?
If starting from an EPSG code or WKT/ProjJSON representation of a CRS, there is no straightforward way to map it to CF if I'm understanding the table you provided
Just specifying the crs_wkt attribute WOULD NOT be a compliant GeoZarr dataset
The reason I bring this up specifically is we've been outputting Zarr from our geospatial system for the last few years and all of the CRS information is specified as EPSG Code, WKT, or ProjJSON. Unless there's a way to always (and reliably!) map from CRS-WKT to CF, supporting GeoZarr would be a nonstarter for us, but we'd VERY much like to adopt GeoZarr.
Any thoughts on this?
Speaking from a user POV and not on the "standards"/specification design side of this: I would plan/hope to use the pyproj library in all of my Python code to do conversions between PROJ.4/WKT2/EPSG/CF. Not sure that is considered "straightforward" though.
I would hope that the issue in your point 1 above could be resolved in the future by suggesting to CF to adopt crs_wkt as an alternative to the CF grid mapping attributes (either the attributes or the crs_wkt or both with priority going to crs_wkt). I don't know how open they would be to that but the fact that not every projection is supported currently would make putting the responsibility on WKT more enticing. Similarly, add the functionality to GDAL to read crs_wkt in addition to the already supported spatial_ref.
That's really the root of the rub here (correct me if I'm wrong): I can represent any CF grid mapping as WKT but not vice versa.
@christophenoel -- I don't think that this is quite accurate.
CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth
This closed issue https://github.com/cf-convention/cf-conventions/issues/410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."
I would be in full support of mandating grid mapping and allowing use of only a crs_wkt
in cases that the CF grid mapping attributes do not support the projection of the data in question.
And to underscore @christophenoel's note above, RECOMMENDING that everyone also provide crs_wkt to ensure maximum interoperability.
@christophenoel -- I don't think that this is quite accurate.
CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth
This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."
Does it has any impact (on a user point of view) of is it only a question of definition "geographic coordinate system based on a Earth" ?
The impact from a user's point of view is that when given no grid_mapping
specifying the shape of the earth, CF is explicitly ambiguous (there is no default shape) rather than implicit with regard to the shape of the spheroid.
Thank you very much for trying to explain. I think that my gaps in the geospatial field could lead to an endless discussion...
For me, if we provide latitude/longitude in degrees, I expect to be able to navigate on Earth based on the real shape of our Earth. So it works for a single point, however, I assume that you're suggesting that the gridded data is projected in any case, so we should know the exact projection (e.g. WGS84) to be able to interpret the data correctly.
For me, if we provide latitude/longitude in degrees, I expect to be able to navigate on Earth based on the real shape of our Earth. So it works for a single point, however, I assume that you're suggesting that the gridded data is projected in any case, so we should know the exact projection (e.g. WGS84) to be able to interpret the data correctly.
I could be misunderstanding what you're saying or asking, but there is no "real shape of our Earth" when it comes to this stuff. You must have some definition/model of the Earth (a coordinate system) or else you have no guarantee that degrees (lon/lat) in your data are the same as the degrees (lon/lat) of someone else's data (different radius of the Earth, different "center" point of the coordinate system, etc). Software could always make an assumption or have some default Earth for a geographic (degrees) coordinate system, but if we're talking about accuracy then there is no guarantee unless it is explicitly defined. In my opinion nothing should be implied for new data file formats and everything should be explicit.
Exactly @djhoese.
@christophenoel -- every (lat/lon or projected) coordinate that references the earth must either declare (explicit) or assume (implicit) an earth spheroid model upon which the latitude / longitude coordinate system is defined. The GPS system is based around the WGS84 geodetic datum.
Note that it is common practice in some fields (weather and climate in particular) to use different earth models interchangeably. i.e. use weather stations with elliptical lat/lon as if they use a spherical earth assumption. This practice has led to tremendous confusion when using CF data.
Okay, understood. There's no notion of degrees without a model. Intuitively, when I think of degrees, it's implicitly WGS84, which assumes an ellipsoid.
So, to get back to a your statement, I get that CF doesn't guarantee that WGS84 is presupposed.
Just wanted to circle back on this topic - what's the timeline look like for discussing/implementing some of these changes around CRS? Is there a general consensus about supporting crs_wkt in GeoZarr?
@dwilson1988 : When the SWG is approved by the OGC, a specification document (following OGC standards) will be created with the necessary structure, allowing anyone interested to collaborate on topics of their choice. However, nothing prevents someone from starting work on the tickets or in the repository
I haven't followed what GeoZarr has decided regarding CRS encoding, but FYI GDAL 3.9 per https://github.com/OSGeo/gdal/pull/9108 will be able to infer CRS in a Zarr dataset using a CF-1 grid_mapping variable (basically raw conversion of netCDF CF-1 to Zarr)
This is good news, and I assume that the library supports all the extensions such as: Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements
The meetings were interrupted for several months to create the OGC SWG, and they will resume this week.
As it has been reported multiple times that the CF encoding of CRS (Coordinate Reference System) may pose issues, I propose starting a dedicated thread to gain a better understanding of these concerns.
Furthermore, I recently converted TerraSAR GeoTiff to Zarr, using rioxarray in conjunction with gdal and xarray, which resulted in the following CF grid mapping. I am somewhat surprised by the way GDAL encoded the CRS, which could be a result of the combination of gdal with xarray.
EDIT NOTE: A key topic of this discussion concerns the need to make grid_mapping mandatory in order to exclude the possibility of having an implicit (undefined) CRS. Furthermore, grid_mapping includes several formats, and it would be wise to recommend one of them (WKT, Proj4j, ...).