zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

ArcGIS/QGIS interoperability #37

Open briannapagan opened 4 months ago

briannapagan commented 4 months ago

Following up on: https://github.com/qgis/QGIS/issues/54240 Using the files listed: https://github.com/zarr-developers/geozarr-spec/issues/36

Using this space to document our findings

katyapotapov commented 4 months ago

I was able to install the latest QGIS with GDAL 3.8 on Macbook by running the following commands (you can probably substitute conda instead of micromamba, and we're using Python 3.11 because zarr doesn't support 3.12 yet):

micromamba create --name qgis-install
micromamba activate qgis-install
micromamba install python==3.11 --channel conda-forge
micromamba install qgis --channel conda-forge

After that you can launch it from the same terminal by calling qgis

We should use GDAL >3.8 since this is the earliest version of GDAL that supports Zarr V3.

katyapotapov commented 4 months ago

On GDAL startup, I ran into a permissions issue with the fault handler (using an M1 Macbook) which seems to cause any crashes in Python to crash QGIS as well.

katyapotapov commented 4 months ago

Here's how I loaded both sets of data in QGIS 3.34.3 (compiled with GDAL version 3.8.1, running with GDAL 3.8.3)

You should see some options show up right after you paste the path, in my case there were 6 different options.

image

Here are some findings from loading the NetCDF-like data in QGIS:

After setting MULTIBAND to "Yes" and adding all layers, I switched the project CRS to EPSG:4326 and was able to see all of the variables. Here are some screenshots:

image image image

After loading the variables with MULTIBAND=Yes, I wasn't able to reproject one of the variables and got the following error:

ERROR 4: ZARR:"/Users/katya/Documents/small-projects/zarr-python/GLDAS_NOAH025_3H.zarr":/Albedo_inst|option:MULTIBAND=YES: No such file or directory
ERROR 4: Failed to open source file ZARR:"/Users/katya/Documents/small-projects/zarr-python/GLDAS_NOAH025_3H.zarr":/Albedo_inst|option:MULTIBAND=YES
Process returned error code 2

I got this error even after setting the source CRS correctly (EPSG:4326).

This seems like an issue with parsing the options as part of the filepath, because when I tried the same thing after loading the variables with all default settings, I was able to reproject it.

wietzesuijker commented 4 months ago

The main issue is about having an interactive viewer on large data. So I'd focus on how QGIS indexes large data, e.g. a {x: 100_000, y: 100_000, band: 50} Zarr dataset in blob storage in some cloud. Would be good to figure out if this is purely a GDAL optimization issue or if we can do something in the data structure or qgis to enable this. This is closely related to the pyramiding topic Max will work on tomorrow.

katyapotapov commented 4 months ago

Loaded the GeoTIFF-like data into an EPSG:4326 project, here's what it looks like:

image

That little strip above the image is the datetime/ variable.

Not sure what happened with the band/ variable - the popup on the warning sign on that layer says the layer data source could not be found.

Even though the sr/ data renders just fine, I wasn't able to double-click on the layer to open the layer info - QGIS hangs every time. In QGIS logs I only see a transformation error which showed up for the previous dataset as well, so it's probably not related to the crash:

GDAL ERROR 1: Unable to compute a transformation between pixel/line and georeferenced coordinates for ZARR:"/Users/katya/Documents/small-projects/zarr-python/GLDAS_NOAH025_3H.zarr":/time_bnds. There is no affine transformation and no GCPs. Specify transformation option SRC_METHOD=NO_GEOTRANSFORM to bypass this check.

Here is an excerpt from the stack trace I get when I kill QGIS after it hangs:

Heaviest stack for the main thread of the target process:
...
  14  QgsRasterLayer::htmlMetadata() const + 7172 (libqgis_core.3.34.3.dylib + 13775912) [0x10be43428]
  14  QgsGdalProvider::hasStatistics(int, int, QgsRectangle const&, int) + 812 (libqgis_core.3.34.3.dylib + 6490164) [0x10b750834]
  14  GDALRasterBand::GetStatistics(int, int, double*, double*, double*, double*) + 660 (libgdal.34.3.8.3.dylib + 13651108) [0x113244ca4]
  14  GDALRasterBand::ComputeStatistics(int, double*, double*, double*, double*, int (*)(double, char const*, void*), void*) + 2252 (libgdal.34.3.8.3.dylib + 13653896) [0x113245788]
  14  GDALRasterBand::GetLockedBlockRef(int, int, int) + 512 (libgdal.34.3.8.3.dylib + 13639368) [0x113241ec8]
  14  GDALRasterBandFromArray::IReadBlock(int, int, void*) + 172 (libgdal.34.3.8.3.dylib + 14272328) [0x1132dc748]
  14  GDALRasterBandFromArray::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, long long, long long, GDALRasterIOExtraArg*) + 652 (libgdal.34.3.8.3.dylib + 14273028) [0x1132dca04]
  14  GDALMDArray::Read(unsigned long long const*, unsigned long const*, long long const*, long long const*, GDALExtendedDataType const&, void*, void const*, unsigned long) const + 1140 (libgdal.34.3.8.3.dylib + 14190684) [0x1132c885c]
  14  ZarrArray::IRead(unsigned long long const*, unsigned long const*, long long const*, long long const*, GDALExtendedDataType const&, void*) const + 2548 (libgdal.34.3.8.3.dylib + 6954632) [0x112be1e88]
  14  ZarrV2Array::LoadTileData(unsigned long long const*, bool, CPLCompressor const*, ZarrByteVectorQuickResize&, ZarrByteVectorQuickResize&, ZarrByteVectorQuickResize&, bool&) const + 1244 (libgdal.34.3.8.3.dylib + 6994580) [0x112beba94]
  14  CPLBloscDecompressor(void const*, unsigned long, void**, unsigned long*, char const* const*, void*) + 408 (libgdal.34.3.8.3.dylib + 897988) [0x11261b3c4]
  14  blosc_decompress_ctx + 68 (libblosc.1.21.5.dylib + 22484) [0x1050b57d4]
  14  blosc_run_decompression_with_context + 244 (libblosc.1.21.5.dylib + 22800) [0x1050b5910]
  14  do_job + 356 (libblosc.1.21.5.dylib + 18248) [0x1050b4748]
  8   blosc_d + 432 (libblosc.1.21.5.dylib + 24776) [0x1050b60c8]
  5   blosc_internal_bshuf_untrans_bit_elem_scal + 180 (libblosc.1.21.5.dylib + 42920) [0x1050ba7a8]
briannapagan commented 4 months ago

@wietzesuijker there are two issues here, pyramiding support and potentially an underlying gdal error, for your issue did you get an error or it's just slow?

ckfisher commented 4 months ago

For our comparison, the same test in ArcGIS Pro (v3.2.1) with GLDAS_NOAH025_3H.zarr can be done by using the "Multidimensional Raster Layer" toolbox described here.

Add Data:

Screenshot 2024-02-07 at 2 53 21 PM

Select layers/bands (you can also see the StdTime dimension here):

Screenshot 2024-02-07 at 2 53 37 PM

With seemingly complete metadata:

Screenshot 2024-02-07 at 3 24 48 PM

Overall, pretty simple usage.

katyapotapov commented 4 months ago

Trying the NetCDF 4D data in QGIS:

image

Looks like everything loaded correctly. There was a new band created for each time/pressure combination which is to be expected, although it's not super intuitive to navigate (there are 96 bands under this layer):

image
ckfisher commented 4 months ago

And the same for ArcGIS. Interprets time and level dimensions correctly (with fun sliders!).

Screenshot 2024-02-07 at 3 59 01 PM Screenshot 2024-02-07 at 3 51 44 PM Screenshot 2024-02-07 at 3 52 25 PM
wietzesuijker commented 4 months ago

@wietzesuijker there are two issues here, pyramiding support and potentially an underlying gdal error, for your issue did you get an error or it's just slow?

I remember receiving a few errors as well when loading into qgis. I did manage to inspect the data, e.g., using the value tool plugin, though the UX didn't feel very optimized.