openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Add support for TensorStore for Zarr opening #224

Closed jacobbieker closed 9 months ago

jacobbieker commented 1 year ago

Tensorstore looks like it might be able to open our large zarrs faster. @dfulu mentioned up to twice as fast for some of our NWP data. This could also help speed up the satellite loading, which is one of the slowest parts of the data creation time right now.

Context

Zarr-Python is not very fast to open zarrs, and the slow reading from zarr is a bottleneck for us. Zarr-rust is being worked on that would speed things up possible a lot more, but in the meantime, this could be a relatively small change for significant gains in speed.

Possible Implementation

https://github.com/google/xarray-tensorstore should be a drop in replacement, with the possible need to open each NWP file separately, but that needs looking into. It would be a relatively small change to make.

jacobbieker commented 1 year ago

Forgot to search beforehand, duplicate (mostly) of https://github.com/openclimatefix/ocf_datapipes/issues/198