zarr-developers / VirtualiZarr

Create virtual Zarr stores from archival data files using xarray syntax
https://virtualizarr.readthedocs.io/en/stable/api.html
Apache License 2.0
123 stars 24 forks source link

Rewrite manifest logic in Rust? #23

Open TomNicholas opened 8 months ago

TomNicholas commented 8 months ago

The code in the manifests.manifest.py file is focused on input validation, creating an immutable data structure (the ChunkManifest), and merging multiple ChunkManifests together performantly (i.e. concatenating them). The validation is done using pydantic, which already uses Rust internally.

We might imagine re-implementing the core manifest logic in rust, then using python bindings to wrap it up into a python array that xarray can wrap. I have no idea if this is really a performance limitation, but it might be.

TomNicholas commented 1 month ago

Since we merged #107, the context of this suggestion has changed. Now, in order to replace the ChunkManifest class one would need to write something that could replicate these features of the current triple-numpy-array solution:

The reasons why this idea might still be of interest are:

  1. To make ingestion of references from rust-based readers (e.g. hypergrib) more efficient (see https://github.com/zarr-developers/VirtualiZarr/issues/238),
  2. As another way to make the representation of references configurable (again see https://github.com/zarr-developers/VirtualiZarr/issues/238 for a use case)
  3. To make handing off the references to another rust application more efficient (cc @mpiannucci)

Probably rewriting in rust is overkill though, and we can achieve the above in python + using numpy rust bindings.

cc @jackkelly @emfdavid