✨ This code is highly experimental! Let the buyer beware ⚠️ ;) ✨
CI | |
---|---|
Docs | |
Package | |
License |
A proxy for Zarr stores that allows for chunking overrides. This is useful for clients that want to request data in a specific chunking scheme, but the data is stored in a different chunking scheme (e.g. a dataset stored in a chunking scheme that is optimized for fast reading, but the client wants to request data in a chunking scheme that is optimized for fast rendering). One advantage of using a proxy is that we don't need to persistently store the data in multiple chunking schemes. Instead, we can simply request the data in the desired chunking scheme on the fly.
The proxy is a simple FastAPI application. It can be run locally using the following command:
uvicorn zarr_proxy.main:app --reload
Once the proxy is running, you can use it to access a Zarr store by using the following URL pattern: http://{PROXY_ADDRESS}/{ZARR_STORE_ADDRESS}
. For example, if the proxy is running on localhost:8000
and you want to access the Zarr store at https://my.zarr.store
, you would use the following URL: http://localhost:8000/my.zarr.store
.
The proxy supports the following HTTP headers:
chunks
: A comma-separated list of chunk overrides. Each chunk override is of the form {variable}={shape}
, where variable
is the name of the variable to override and shape
is the shape of the chunks to use for that variable. For example, chunks=temperature=256,256,30,pressure=256,256,30
would override the chunking of the temperature
and pressure
variables to be 256x256x30 and 256x256x30, respectively. If a variable is not specified in the chunks
header, the chunking of that variable will not be overridden.Before constructing the chunks
header, a Python client might inspect the dataset .zmetadata
to determine the existing chunking of each variable. This can be done using the requests library:
import requests
proxy_zarr_store = 'http://localhost:8000/my.zarr.store'
# get zmetadata
zmetadata = requests.get(f'{proxy_zarr_store}/.zmetadata').json()
print(zmetadata)
Once the .zmetadata
has been retrieved, the client can construct the chunks
header. For example, the following code will construct a chunks
header that overrides the chunking of temperature
and pressure
variables(arrays) to be 256x256x30:
chunks='temperature=256,256,30,pressure=256,256,30'
We can then use the chunks
header to construct a Zarr
store and by passing the chunks
header to the client_kwargs
argument of the zarr.storage.FSStore
constructor:
import zarr
store = zarr.storage.FSStore(proxy_zarr_store, client_kwargs={'headers': {"chunks": chunks}})
This store can be then used via the Xarray library:
import xarray as xr
ds = xr.open_dataset(store, engine='zarr', chunks={})
A web-based client might prefetch and inspect dataset .zmetadata
before constructing a Headers
object with desired chunks
header(s) to pass on to a Zarr client.
In this example, the getHeaders()
constructor includes chunks
headers for all variables whose existing chunking does not meet the use-case-specific chunk "cap" requirements:
const getHeaders = (variables, zmetadata, axes) => {
const headers = [];
variables.forEach((variable) => {
const existingChunks = zmetadata.metadata[`${variable}/.zarray`].chunks;
const dims = zmetadata.metadata[`${variable}/.zattrs`]["_ARRAY_DIMENSIONS"];
const { X, Y } = axes[variable];
// cap spatial dimensions at length 256, cap non-spatial dimensions at length 30
const limits = dims.map((d) => ([X, Y].includes(d) ? 256 : 30));
const override = getChunkShapeOverride(existingChunks, limits);
if (override) {
shape.push(["chunks", `${variable}=${override.join(",")}`]);
}
});
return new Headers(headers);
};