Open rabernat opened 1 year ago
The Pangeo ecosystem has done a lot to move Zarr forward as a cloud native format. Here's what our stack looks like for the common use case of reading Zarr from cloud storage.
If you want to parse CRS information out of your Xarray data that has been loaded in this way, I believe that your only option is MetPy's assign_crs
function, but I could be wrong about that.
flowchart TD
s3fs --> zarr-python
zarr-python --> Dask
Dask --> Xarray
zarr-python --> Xarray
Xarray --> MetPy
I'm not sure to see what you mean with the CRS. You might parse it with MetPy (and I suppose with Rioxarry) if you first set it, right ?
I'm not sure to see what you mean with the CRS. You might parse it with MetPy (and I suppose with Rioxarry) if you first set it, right ?
What I mean is that Xarray by itself has no inherent understanding of CRS. The metadata are there, but not useful. MetPy's parse_cf
function attaches an actual cartopy CRS object to the dataset, which can then be used for plotting. Here's an example I saw on twitter today: https://twitter.com/at_dot_Py/status/1627206421950664705
To follow up on this issue, at today's meeting, we played around with trying to parse data from Brianna's example Zarr dataset into rioxarray. Notebook here
https://gist.github.com/rabernat/8c53c380fbb38dcf556e85487960a847
Here's an example of rioxarray (building on rasterio & GDAL) understanding CRS information (via rioxarray's support for CF conventions I believe).
import xarray as xr
import rioxarray
import requests
token = requests.get(
"https://planetarycomputer.microsoft.com/api/sas/v1/token/daymet-daily-hi"
).json()["token"]
ds = xr.open_dataset("abfs://daymet-zarr/daily/hi.zarr", engine="zarr", decode_coords="all", storage_options={"account_name": "daymeteuwest", "credential": token})
print(ds.rio.crs)
print("---")
print(ds.rio.transform())
which prints out
PROJCS["undefined",GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6378137,298.257223563]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",25],PARAMETER["standard_parallel_2",60],PARAMETER["latitude_of_origin",42.5],PARAMETER["central_meridian",-100],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
---
| 1000.00, 0.00,-5802750.00|
| 0.00,-1000.00,-38500.00|
| 0.00, 0.00, 1.00|
In that case, the .crs
object is a rasterio.crs.CRS
. I believe it used pyproj
to parse the CF metadata into a CRS object.
It doesn't appear that odc.geo
correctly reads the geospatial metadata for this dataset. I'll share an example later of odc.geo
correctly interpreting geospatial metadata.
Super useful Tom! You issue reveals the problem that different python libraries use different in-memory representation of CRS / transform. This is separate from, but closely related to, the on-disk representation (e.g. #12).
@TomAugspurger : so the source metadata only includes CF projection (grid mapping) and rioxarray handles that ? I don't remember that this was working last year on my side. Surprising.
You issue reveals the problem that different python libraries use different in-memory representation of CRS / transform.
Yes, at some point I want to dig into why that is. As long as they're all using the same "protocol" things aren't so bad, but still it'd be nicer to consolidate if possible.
@TomAugspurger : so the source metadata only includes CF projection (grid mapping) and rioxarray handles that ? I don't remember that this was working last year on my side. Surprising.
Yes. I think it's some combination of https://github.com/corteva/rioxarray/blob/34c64140ba1f38e020e6d4b2cd0f3e4bb4c36f62/rioxarray/rioxarray.py#L262-L277 and https://github.com/corteva/rioxarray/blob/34c64140ba1f38e020e6d4b2cd0f3e4bb4c36f62/rioxarray/rioxarray.py#L307-L310 in rioxarray. It is important to use decode_coords="all"
in this specific example, otherwise rioxarray isn't able to infer what to use for the CRS. I haven't looked at why.
As we discussed on today's call, it will be very useful to enumerate different tool chains used to read Zarr in a geospatial context in different languages / package ecosystems. This will help us understand where GeoZarr actually needs to be implemented.
I'll go first in a reply to this issue.