xarray-contrib / datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
https://xarray-datatree.readthedocs.io
Apache License 2.0
161 stars 43 forks source link

open_datatree() keeps the hdf file open preventing writes #325

Open KareemShalabi opened 3 months ago

KareemShalabi commented 3 months ago

Consider this analysis pipline: Multiple arrays for the same data variable organized in a group hierarchy inside HDF file according to some attributes. A datatree is a perfect data structure container for that. I can read all arrays in a chunked dask datasets, and map the function over the datatree collecting the results on the way.

Because the size of the final result of the function is way out of memory, I tried saving the intermediary results(result of computation in a single iteration) to the same file and group path returning the new chunked dataarray after reloading. An exception is thrown, because the file is hold open by the datatree object. This does not happen when I create datatree object myself ( from a dict of group paths and dataarray objects).

TomNicholas commented 3 months ago

Thanks for raising this. I think this issue is a duplicate of #93. There was a PR opened to fix it but realistically given that we're currently integrating datatree into Xarray main, we'll probably prioritize fixing there instead of in this package.