xarray-contrib / datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
https://xarray-datatree.readthedocs.io
Apache License 2.0
162 stars 43 forks source link

`to_zarr()` overwrites by default #274

Closed nicrie closed 7 months ago

nicrie commented 7 months ago

Currently, the to_zarr() method's mode parameter defaults to w which is (imo) not ideal.

Picture this: an optimistic user (who's totally not me) thinks, "Hey, to_zarr(./path/to/everything/I/hold/dear/) will just add my zarr data to my collection of digital life achievements." Instead, it ruthlessly purges all within the specified directory, leaving nothing but the zarr dataset in its wake.

Suggestion: Switch the default to mode="w-". Yes, one could still choose to unleash chaos upon their files, but it should be a conscious choice - not an "oops" moment. ;)

slevang commented 7 months ago

Will second this!

Context: I just implemented a method for serializing models in xeofs, which involved a lot of complex nested structures, for which datatree and zarr were perfect tools. Thank you for the great package!

However, was very surprised to realize datatree breaks with xarray on the default zarr write mode. I don't see any obvious reason for it, and it may not even be intentional. to_zarr(mode="w") is potentially much more destructive than to_netcdf(mode="w"), because the former can rm -r an entire directory, whereas the latter can only remove a single file. Hence the safer default of to_zarr(mode="w-").

TomNicholas commented 7 months ago

Instead, it ruthlessly purges all within the specified directory, leaving nothing but the zarr dataset in its wake.

I'm sorry that happened @nicrie !

it may not even be intentional

I don't remember making an active choice about this myself, and the proposed change in #275 seems fine to me. @jhamman is there any subtlety here I'm missing?

nicrie commented 7 months ago

No problem @TomNicholas , nothing much happened! Thanks for the cool package, by the way! :)