Open dblodgett-usgs opened 1 year ago
Is there any sense in tailoring / simplifying / extending the approach in CF 1.10 to suit these needs? https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#example-Two-dimensional-tie-point-interpolation
aouch, I hadn't seen that CF "coordinate subsampling" method yet (always amazed by the universe of possibilites offered by the CF spec...). This is super complex. This is a kind of subsampling geolocation array, but with several subsampling/interpolation sub-areas.
GeoTIFF is much much simpler that that (and far less capable). The basic model of GeoTIFF is to have a geotransformation matrix with 6 terms:
[ X0 X1 Xoff ]
M = | Y0 Y1 Yoff |
| 0 0 1 |
to express that [Xgeoref Ygeoref 1] = M [column row 1 ] Said otherwise: Xgeoref = X0 column + X1 row + Xoff Ygeoref = Y0 column + Y1 * row + Yoff (Xoff, yOff) is the georeferenced coordinate of point of indice (0,0) In most situations X1 = Y0 = 0 to express no rotation ("north-up" image). In GDAL geotransform convention, this is encoded as a 6 double value [Xoff, X0, X1, Yoff, Y0, Y1] So basically, you don't need to define a variable corresponding to the X and Y dimensions indexing the array of the data. In netCDF CF, for an image without rotation, an alternative is to have 1D X and Y variables that contain the coordinate of each pixel center. If they are regularly spaced, you can deduce the values of the M matrix.
This is helpful.
I've not read the CF1.10 8.3 closely yet -- I wonder if you could actually use some simple subset of what's possible there to support the [Xoff, X0, X1, Yoff, Y0, Y1]
and relating that geotransform to variables?
"deduce the values of the M matrix" is something that I think we need to get away from. I know it's possible and done in various softwares, but I don't think it is "right".
An "as simple as possible" convention for this could be: (this is just a strawman)
...
int coord ;
coord:geotransformation_matrix = "Xoff, X0, X1, Yoff, Y0, Y1"
float data(latitude, longitude) ;
data:grid_mapping = "crs: latitude, longitude" ;
...
int crs ;
crs:grid_mapping_name = "latitude_longitude";
crs:longitude_of_prime_meridian = 0.0 ;
crs:semi_major_axis = 6378137.0 ;
crs:inverse_flattening = 298.257223563 ;
crs:crs_wkt =
GEODCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1.0]]],
PRIMEM["Greenwich",0],
CS[ellipsoidal,3],
AXIS["(lat)",north,ANGLEUNIT["degree",0.0174532925199433]],
AXIS["(lon)",east,ANGLEUNIT["degree",0.0174532925199433]],
AXIS["ellipsoidal height (h)",up,LENGTHUNIT["metre",1.0]]]
...
I know it's possible and done in various softwares, but I don't think it is "right".
yes, this is indeed fragile, especially when file creators decide to use Float32 instead of Float64, in which case you need to add fuzziness in your comparison for identical spacing.
An "as simple as possible" convention for this could be: (this is just a strawman)
This is quite close to what GDAL does.
$ gdal_translate byte.tif byte.nc
$ ncdump byte.nc
netcdf byte {
dimensions:
x = 20 ;
y = 20 ;
variables:
char transverse_mercator ;
transverse_mercator:grid_mapping_name = "transverse_mercator" ;
transverse_mercator:longitude_of_central_meridian = -117. ;
transverse_mercator:false_easting = 500000. ;
transverse_mercator:false_northing = 0. ;
transverse_mercator:latitude_of_projection_origin = 0. ;
transverse_mercator:scale_factor_at_central_meridian = 0.9996 ;
transverse_mercator:long_name = "CRS definition" ;
transverse_mercator:longitude_of_prime_meridian = 0. ;
transverse_mercator:semi_major_axis = 6378206.4 ;
transverse_mercator:inverse_flattening = 294.978698213898 ;
transverse_mercator:spatial_ref = "PROJCS[\"NAD27 / UTM zone 11N\",GEOGCS[\"NAD27\",DATUM[\"North_American_Datum_1927\",SPHEROID[\"Clarke 1866\",6378206.4,294.978698213898,AUTHORITY[\"EPSG\",\"7008\"]],AUTHORITY[\"EPSG\",\"6267\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4267\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",-117],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"26711\"]]" ;
transverse_mercator:crs_wkt = "PROJCS[\"NAD27 / UTM zone 11N\",GEOGCS[\"NAD27\",DATUM[\"North_American_Datum_1927\",SPHEROID[\"Clarke 1866\",6378206.4,294.978698213898,AUTHORITY[\"EPSG\",\"7008\"]],AUTHORITY[\"EPSG\",\"6267\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4267\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",-117],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"26711\"]]" ;
transverse_mercator:GeoTransform = "440720 60 0 3751320 0 -60 " ;
double x(x) ;
x:standard_name = "projection_x_coordinate" ;
x:long_name = "x coordinate of projection" ;
x:units = "m" ;
double y(y) ;
y:standard_name = "projection_y_coordinate" ;
y:long_name = "y coordinate of projection" ;
y:units = "m" ;
byte Band1(y, x) ;
Band1:long_name = "GDAL Band Number 1" ;
Band1:_Unsigned = "true" ;
Band1:valid_range = 0s, 255s ;
Band1:grid_mapping = "transverse_mercator" ;
// global attributes:
:GDAL_AREA_OR_POINT = "Area" ;
:Conventions = "CF-1.5" ;
:GDAL = "GDAL 3.7.0dev-afbe14d34f, released 2023/04/12 (debug build)" ;
:history = "Wed Apr 12 20:04:20 2023: GDAL CreateCopy( byte.nc, ... )" ;
data:
transverse_mercator = "" ;
x = 440750, 440810, 440870, 440930, 440990, 441050, 441110, 441170, 441230,
441290, 441350, 441410, 441470, 441530, 441590, 441650, 441710, 441770,
441830, 441890 ;
y = 3750150, 3750210, 3750270, 3750330, 3750390, 3750450, 3750510, 3750570,
3750630, 3750690, 3750750, 3750810, 3750870, 3750930, 3750990, 3751050,
3751110, 3751170, 3751230, 3751290 ;
Band1 = [...]
But I realize that putting the GeoTransform
attribute inside the transverse_mercator
grid_mapping variable is not the best modelization choice, if you'd want to have different variables with the same CRS but a dfiferent extent in the same file
Oh interesting.... Great minds think alike I guess! :)
In your example, the "projection_x_coordinate" and "projection_y_coordinate" are superfluous for clients that work with the "GeoTransform" attribute of the grid mapping, correct?
I actually think this is a great solution. If, for GeoZARR, we relax the CF-NetCDF requirement to include the explicit coordinate variables, this structure would "just work" with gdal and not be totally incompatible with CF.
So we could add a clause to GeoZarr Coordinates:
A GeoZarr Coordinates variable is a one dimensional Zarr Array or Auxiliary variable containing a "grid_mapping" to include a GeoTransform that indexes a dimension of a GeoZarr DataArray (e.g latitude, longitude, time, wavelength).
a GeoTransform is a space delimited list of 6 values: "X_offset X0 X1 Y_offset Y0 Y1 " where the X and Y offset are the grid origin, and X0, Y0 are the grid spacing per data collumn and X1, Y1 are the grid spacing per row. That is, if X0 = Y1 = 0, the grid is axis aligned.
(some text more carefully crafted than that?
In your example, the "projection_x_coordinate" and "projection_y_coordinate" are superfluous for clients that work with the "GeoTransform" attribute of the grid mapping, correct?
yes. They are there to for CF-only capable readers. GDAL aware code will use the geotransform attribute when present.
(some text more carefully crafted than that?
You probably want to include the equations I put above to remove any ambiguity. https://gdal.org/user/raster_data_model.html#affine-geotransform may also help.
Something that also needs to be specified is what is the convention to interpret which "part" of a pixel a georeferenced coordinate matches: that is pixel-corner vs pixel-center convention. Or support both.
Center seems to be the more sensible default, no? Any reason to support both?
Center seems to be the more sensible default, no?
sounds reasonable
Any reason to support both?
Probably not. This is a endless source of confusion. Actually I believe this topic mixes 2 things, which are often interleaved when this is discussed:
If this basic approach is agreeable, maybe @rouault would be willing to suggest an approach to encode origin/offset/transform metadata as attributes?
I wouldn't mix coordinate with auxiliary variable which is meant for other purpose. Instead, I would suggest:
GeoZarr Coordinates variable is either
- one dimensional Zarr Array that indexes a dimension of a GeoZarr DataArray (e.g latitude, longitude, time, wavelength).
- an empty Zarr Array that describes a RegularAxis or IrregularAxis of a dimension of a GeoZarr DataArray
RegularAxis includes the following attributes:
- lowerBound: lowest coordinate along this grid axis
- upperBound: highest coordinate along this axis
- resolution: grid resolution along this axis
IrregularAxis includes the following attribute:
- directPositions: orderd sequence of direct position along this axis
I was not aware that the Auxiliary data
had meaning beyond it being an empty array.
Can you explain why you want to introduce the RegularAxis and IrregularAxis and use lowerBound/upperBound/resolution rather than the geoTransform as I suggested in #19 ?
It's just one suggestion. don't know geoTransform and have nothing against it.
Why would you put these six numbers in a character string, and not in a data variable?
It's a fair question. I think because, to do that, you would have to declare a dimension and a whole data variable that would have to integrate into the data format. What we want is a compact attribute to contain the values.
It's not very common, but you can make numeric vector attributes in NetCDF. I have to admit ignorance with ZARR's handling of attributes though. Will ZARR support attributes that are a vector of six doubles?
Answering my own question here. Yes, that would work. I was just being a bit lazy with my "as simple as possible" suggestion.
https://zarr.readthedocs.io/en/stable/tutorial.html#user-attributes
Internally Zarr uses JSON to store array attributes, so attribute values must be JSON serializable.
If there is a remote possibility that the conversion binary -> text -> binary doesn't return the exact binary number you started with, I'd opt for not doing that conversion, as a write & read step will result in a slightly differently positioned raster, and downstream software will say these rasters don't match.
The issue with binary->text conversion will necessarily occur with Zarr, JSON being a text format:
{"GeoTransform": "X_offset X0 X1 Y_offset Y0 Y1"}
{"GeoTransform": [X_offset, X0, X1, Y_offset, Y0, Y1]}
It would probably be better to adopt the later.@rouault -- So you are saying that we should adopt an attribute that is a length six double precision vector and that the space delimited netcdf attribute currently used by gdal would be supported but not in a convention?
So you are saying that we should adopt an attribute that is a length six double precision vector
yes
and that the space delimited netcdf attribute currently used by gdal would be supported but not in a convention?
That's a GDAL netCDF specific implementation detail. There's no reason to keep it in GeoZarr
I just made a Pangeo fo post on some of the implications of this issue. I feel I understand it much better now after writing this up: https://discourse.pangeo.io/t/example-which-highlights-the-limitations-of-netcdf-style-coordinates-for-large-geospatial-rasters/4140
We have not settled on the right way to encode binary data in Zarr attrs yet, although there has been plenty of discussion https://github.com/zarr-developers/zarr-specs/issues/229
We should find a way to move forward on this. I gave some ideas on the python side in that forum post.
It would be great for zarr to have support for capturing arbitrary linear mapping from pixel to world coordinates. I can't comment on exact format chosen, GDAL convention or just an Affine matrix with/without the last row (0,0,1), so long as 6 degrees of freedom relationship between pixel and world coordinates can be recorded in zarr without having to render coordinate of every pixel separately.
One note: make sure to clearly define where 0,0
is in the pixel plane. My vote is for what GDAL is doing on output side: top-left corner of the top-left pixel is image plane 0 0
, and the mapping to the world is defined accordingly. Please don't go GeoTIFF style with it's confusing Pixel-is-Area (edge is 0) vs Pixel-is-Point (center is 0) setup.
Over in the Xarray discussion I have proposed a very concrete task that could be used to advance this issue:
Try to modify RioXarray's
generate_spatial_coords
to create a custom Range index from the affine parameters instead of a explicit x, y coordinate variables.
Is anyone here game to try?
GeoTransform support in GeoZarr aims to bridge the gap with the GDAL community by simplifying coordinate encoding, replacing explicit coordinates with a more efficient representation.
GeoZarr also focuses on being data format-agnostic, aligning with the Open Geospatial Consortium (OGC) specifications that have evolved over 20 years to provide an abstract data representation. While the OGC Coverage Implementation Schema and related specifications are complex and challenging to understand, they offer an open and flexible approach.
I tried to summarize the basics of both way of describing the link from pixel coordinates to geographic coordinates of gridded data.
In GDAL, GeoTransform
is a six-element array that defines the spatial reference of a raster image:
The projection itself is encoded in a separate spatial reference (SRS), often provided as a WKT (Well-Known Text) string or a PROJ.4 string in the raster metadata, typically stored with the raster file.
The extent
property describes the domain of the coverage for each dimension, including the overall envelope, detailed sub-intervals where data is available, and/or a regular or irregular grid.
Both OGC Coverages and GDAL GeoTransform define the resolution of the data grid, where GDAL uses Pixel Width and Pixel Height, and OGC uses the resolution property.
OGC CIS: ❓ support affine transformations ?
ℹ️ OGC CIS : Skewed and rotated grids are not modelled explicitly; they can be represented by making the grid’s CRS a concatenation of any given CRS with an Engineering CRS describing, e.g., any affine transformation of the original grid.
Wanted to highlight that @benbovy's new Xarray PR - https://github.com/pydata/xarray/pull/9543 - introduces the general concept of Coordinate Transforms to Xarray in a way that will be very useful for the GeoZarr effort.
We need to add a method for encoding origin / offset coordinate variables where the GeoZarr coordinates are not "...a one dimensional Zarr Array that indexes a dimension of a GeoZarr DataArray (e.g latitude, longitude, time, wavelength)."
It would seem that, in essence, we should encode GeoTIFF metadata in a GeoZarr Auxiliary variable
So instead of:
We would have
If this basic approach is agreeable, maybe @rouault would be willing to suggest an approach to encode origin/offset/transform metadata as attributes?
Is there any sense in tailoring / simplifying / extending the approach in CF 1.10 to suit these needs? https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#example-Two-dimensional-tie-point-interpolation