Closed laurenkwick closed 1 month ago
I think that with the right URL GDAL will be able to read it. There is an msft:https-url
for each asset, like https://rhgeuwest.blob.core.windows.net/cil-gdpcir/ScenarioMIP/INM/INM-CM5-0/ssp585/r1i1p1f1/day/pr/v1.1.zarr that could be used with /vsicurl
?
Hey, thanks for the quick response.
I tried using the msft:https-url
and I'm still running into some issues:
> dsn <- 'ZARR:"/vsicurl/https://rhgeuwest.blob.core.windows.net/cil-gdpcir/CMIP/INM/INM-CM5-0/historical/r1i1p1f1/day/tasmax/v1.1.zarr"'
> d <- read_mdim(dsn)
Error: file not found
In addition: Warning message:
In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 4: `ZARR:"/vsicurl/https://rhgeuwest.blob.core.windows.net/cil-gdpcir/CMIP/INM/INM-CM5-0/historical/r1i1p1f1/day/tasmax/v1.1.zarr"' does not exist in the file system, and is not recognized as a supported dataset name.
I also testing without the 'ZARR:' prefix and got the same error.
I'm wondering if setting credentials through environmental variables is required as documented here.
Thanks for the tip about the ZARR prefix. I think it might be a conflict with how we do auth (as a query parameter) and how GDAL Zarr discovers the files under that prefix.
(gdal) taugspurger@DESKTOP-D37TN6N:~$ export TOKEN=$(curl --silent "https://planetarycomputer.microsoft.com/api/sas/v1/token/cil-gdpcir-cc0" | jq -r .token)
(gdal) taugspurger@DESKTOP-D37TN6N:~$ gdalmdiminfo 'ZARR:"/vsicurl/https://rhgeuwest.blob.core.windows.net/cil-gdpcir/ScenarioMIP/BCC/BCC-CSM2-MR/ssp126/r1i1p1f1/day/pr/v1.1.zarr?"'$TOKEN'""'
Warning 1: HTTP response code on https://rhgeuwest.blob.core.windows.net/cil-gdpcir/ScenarioMIP/BCC/BCC-CSM2-MR/ssp126/r1i1p1f1/day/pr/v1.1.zarr?st=2024-06-30T18%3A51%3A38Z&se=2024-07-01T19%3A36%3A38Z&sp=r...&sig=...%3D/.zarray: 403
Notice that it's looking for the .zarray
file at https://.../v1.1.zarr?$TOKEN/.zarray
instead of https://.../v1.1.zarr/.zarray?$TOKEN
. I'm not sure if there's another way to tell GDAL what query parameters to use.
Thanks! That definitely explains my trouble with accessing the dataset. And also is helping me dive deeper into this rabbit hole...
I was digging around to see if there is a way to tell GDAL which query parameter to use and just wanted to document this finding here. I am still trying to see if I can get it to work but I saw that "options can be passed in the filename with the following syntax: /vsicurl?[option_i=val_i&]*url=http://... where each option name and value (including the value of "url") is URL-encoded. " (from /vsicurl/ (http/https/ftp files: random access)
These supported options stand out to me, but I'm not quite there with getting them to work:
pc_url_signing=yes/no: whether to use the URL signing mechanism of Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/docs/concepts/sas/). (GDAL >= 3.5.2). Note that starting with GDAL 3.9, this may also be set with the path-specific option ( cf VSISetPathSpecificOption()) VSICURL_PC_URL_SIGNING set to YES.
pc_collection=name: name of the collection of the dataset for Planetary Computer URL signing. Only used when pc_url_signing=yes. (GDAL >= 3.5.2)
In this stack exchange post, it looks like they ended up using the href
url, instead of the msft:https-url
. It's not quite the same workflow as what I have here, but seems promising?
I'll post an update if I get somewhere with this! Just wanted to update where I'm at now.
Closing, feel free to re-open if there are any updates!
Hi!
I want to start off by saying how amazing it is to work with the datasets available through Microsoft PC.
I'm currently having some trouble figuring out how to access the CMIP6 Public Domain Collection using R. In a tutorial that I am referencing from the r-spatial blog, it is possible to read Zarr files using the R package
stars
because GDAL has a Zarr driver that can read these data through its multidimensional array API. However, I don't know if the GDAL driver is able to implicitly handle the connection to Azure Blob Storage in the same way that xarray does in Python.For example, in the example notebook for CMIP6 CC0-1.0, the STAC item asset properties
xarray:open_kwargs
inform how the dataset should be open. Thexarray:open_kwargs
dictionary defines the # chunks, engine to use, consolidation method, and storage options account name.When looking at the GDAL Zarr driver open options, I don't see a way to include all of these components (specifically the storage options account name), so I end up a receiving a "File Not Found" error. I'm not quite sure how xarray is able to open the Zarr dataset located in Azure Blob Storage, and if it is possible to replicate this in R through the GDAL Zarr driver. Below is the code I was using to test out the CMIP6 access with
rstac
andstars
library.I understand that Zarr was developed in the the Python numpy/xarray communities, however I am hoping to work with this data in R because the rest of my workflow is developed using it. Is it possible to do this?
Thank you!