Open gunnhildsp opened 8 months ago
From the zarr-python
perspective, minimizing memory usage is easy to say -- "load fewer chunks, make your chunks smaller". But your problem seems like an xarray issue? I'm not familiar with what happens inside xarray.open_mfdataset
. It might be helpful to open an issue or discussion over at https://github.com/pydata/xarray/
Zarr version
2.16.1
Numcodecs version
0.12.1
Python Version
3.12.1
Operating System
Linux
Installation
using poetry in a virtual environment
Description
I am trying to write a zarr dataset using netcdf files. To try to limit memory usage, I am first creating daily zarr directories from hourly netcdf files, using xarray. Then I am combining the daily files into a monthly zarr directory. I finally want to write the monthly zarr to an azure blob storage. However, the process is killed (no stacktrace), I assume from running out of memory, when combining the daily zarrs to monthly. If I create a smaller final directory, for example combining two daily zarrs to one, it works fine. I am using xarray version 2024.2.0
Steps to reproduce
Additional output
No response