Open CodyCBakerPhD opened 11 months ago
Thanks for issue and helpful example. Using the same file with different backends (here creating test_hdf5.nwb
with HDF5 and then overwriting with Zarr) is not possible in most cases. I think the best we can do here is to check if the existing file is readable with the I/O backend and raise a ValueError to indicate that the existing file uses a different backend. What do you think?
I don't plan on using the same file with a different backend; the mode="w"
to me, indicates the behavior 'delete if existing' which shouldn't matter if it is one backend or another
Example: I tried writing my NWB file using DANDI filename convention ./sub-{subject_id}_ses-{session_id}.nwb
first using the HDF5 backend, then decided I wanted to switch to Zarr
Slightly annoying if I have to manually delete the file file if it exists before writing it again; I've always utilized the auto-overwrite of mode="w"
to achieve this
the
mode="w"
to me, indicates the behavior 'delete if existing' which shouldn't matter if it is one backend or another
In this case we would need to manually delete the file if it from a different backend before opening it. I.e., the logic would need to be something like
if self.mode == 'w':
if os.path.exists(filename) and not self.can_read(filename):
if os.path.isdir(filename):
shutil.rmtree(filename)
else:
os.remove(filename)
This would probably need to be done in all backends (HDF5 and Zarr). Since this is deleting files that are not from the particular backend, I think it would be best to make this configurable, e.g., by specifing and additional parameter, e.g., force_overwrite=True
, to make sure that this is something the user explicitly decides to do and avoid users accidentally deleting files.
This is ultimately an issue with Zarr (and h5py) -- both libraries raise an error and do not overwrite if the filepath is of a different type than expected (see below). But we can improve on that user experience. I think @oruebel 's suggested fix above is good. By the way, the TAB is thinking of recommending using the ".nwb.zarr" suffix for NWB Zarr files. Alessio/AIND is already doing that.
When Zarr tries to write to a directory that exists as a non-zarr directory (OK):
>>> import zarr
>>> import os
>>> os.mkdir("temp")
>>> z1 = zarr.open("temp", mode="w", shape=(2, 2), dtype="i4")
>>> z1[:] = 42 # creates zarr structure with data
When Zarr tries to write to a directory that exists as a file (error):
>>> import zarr
>>> with open("temp.txt", "w"):
... pass
...
>>> zarr.open("temp.txt", "w")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/rly/mambaforge/envs/hdmfx/lib/python3.10/site-packages/zarr/convenience.py", line 96, in open
_store: BaseStore = normalize_store_arg(
File "/Users/rly/mambaforge/envs/hdmfx/lib/python3.10/site-packages/zarr/storage.py", line 181, in normalize_store_arg
return normalize_store(store, storage_options, mode)
File "/Users/rly/mambaforge/envs/hdmfx/lib/python3.10/site-packages/zarr/storage.py", line 163, in _normalize_store_arg_v2
return DirectoryStore(store)
File "/Users/rly/mambaforge/envs/hdmfx/lib/python3.10/site-packages/zarr/storage.py", line 1036, in __init__
raise FSPathExistNotDir(path)
zarr.errors.FSPathExistNotDir: path exists but is not a directory: %r
When h5py tries to write to a file that exists as a non-hdf5 file (OK):
>>> import h5py
>>> import os
>>> os.mkdir("temp")
>>> with open("temp.txt", "w"):
... pass
...
>>> with h5py.File("temp.txt", "w"): # creates empty hdf5 file
... pass
...
When h5py tries to write to a file that exists as a directory (error):
>>> import h5py
>>> import os
>>> os.mkdir("temp")
>>> h5py.File("temp", "w")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/rly/mambaforge/envs/hdmfx/lib/python3.10/site-packages/h5py/_hl/files.py", line 567, in __init__
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File "/Users/rly/mambaforge/envs/hdmfx/lib/python3.10/site-packages/h5py/_hl/files.py", line 237, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 126, in h5py.h5f.create
IsADirectoryError: [Errno 21] Unable to create file (unable to open file: name = 'temp', errno = 21, error message = 'Is a directory', flags = 13, o_flags = 602)
What happened?
Ran into this during testing on NeuroConv
Steps to Reproduce
Traceback
Operating System
Windows
Python Executable
Conda
Python Version
3.10
Package Versions
No response
Code of Conduct