zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
http://zarr.readthedocs.io/
MIT License
1.41k stars 267 forks source link

fix file modes #2000

Open brokkoli71 opened 2 weeks ago

brokkoli71 commented 2 weeks ago

fixes https://github.com/zarr-developers/zarr-python/issues/1978 according to docstrings of zarr.open the modes should do:

mode : {'r', 'r+', 'a', 'w', 'w-'}, optional
        Persistence mode: 'r' means read only (must exist); 'r+' means
        read/write (must exist); 'a' means read/write (create if doesn't
        exist); 'w' means create (overwrite if exists); 'w-' means create
        (fail if exists).

this PR implements and tests this behavior, so https://github.com/zarr-developers/zarr-python/issues/1978 gets fixed

jhamman commented 2 weeks ago

Thanks @brokkoli71 for picking this up. What is the expected behavior of mode='w'? Should we have a Store.clear() method that we call in these cases?

brokkoli71 commented 2 weeks ago

Thanks @brokkoli71 for picking this up. What is the expected behavior of mode='w'? Should we have a Store.clear() method that we call in these cases?

good point, according to the docstrings: 'w' means create (overwrite if exists). So yes, would be probably a good idea to have a generalized Store.clear. I will have a look at that

d-v-b commented 1 week ago

I agree that a method for clearing a store is a good idea. Just bear in mind that this can be a performance bottleneck when deleting arrays with a lot of chunks, especially on remote storage. We should think about performance at some point.

brokkoli71 commented 1 week ago

I agree that a method for clearing a store is a good idea. Just bear in mind that this can be a performance bottleneck when deleting arrays with a lot of chunks, especially on remote storage. We should think about performance at some point.

I was wondering, if supporting mode='w' as create (overwrite if exists) is a good idea concerning the performance? what is the usecase of this mode? maybe it would be better to explicitly have a delete or clear method? or we should have a warning in the docstrings?

d-v-b commented 1 week ago

I agree that a method for clearing a store is a good idea. Just bear in mind that this can be a performance bottleneck when deleting arrays with a lot of chunks, especially on remote storage. We should think about performance at some point.

I was wondering, if supporting mode='w' as create (overwrite if exists) is a good idea concerning the performance? what is the usecase of this mode? maybe it would be better to explicitly have a delete or clear method? or we should have a warning in the docstrings?

There's a lot of usage for create + overwrite -- if users are rapidly iterating on something, they already created an old array, but they want to save a new array with the same name + different datatype, then the easiest way to do this is via something like an overwrite option. Handling this via array the creation flow is more ergonomic than forcing users to go through a separate API.

Don't worry too much about the performance concerns with bulk deletion for now. We should make the API convenient, then worry about performance later. I only brought delete performance up because it's something that zarr python v2 didn't deal with, which caused me some headaches.