Open h-mayorquin opened 1 month ago
Thanks for reporting this issue. Unfortunately I don't think there is much we can do about this. ":" is not allowed in file names on Windows, and since Zarr translates the TimeSeries to a folder on disk, the name must be a valid name.
Perhaps we can make user experience better?
A similar error is hdf5 limitation about slashes in names:
from pynwb.testing.mock.file import mock_NWBFile
from pynwb.image import TimeSeries
nwbfile = mock_NWBFile()
time_series = TimeSeries(name="Name/name", rate=1.0, data=[1, 2, 3], unit="unit")
nwbfile.add_acquisition(time_series)
nwbfile_path = "test.nwb"
from pynwb import HDMFIO
with HDMFIO(path=nwbfile_path, mode="w") as io:
io.write(nwbfile)
Which outputs a value error:
ValueError: name 'Name/name' cannot contain '/'
Perhaps we could have something similar?
More deeply this seems like a leaky abstraction where an implementation detail (the naming scheme in the backend) is leaked to something that the user can control directly. It is an interesting coupling.
Perhaps we can make user experience better?
A similar error is hdf5 limitation about slashes in names:
Adding an error check in ZarrIO to improve reporting sounds reasonable. We would need to add a check for both groups and datasets, so we'd probably need a helper function to check if a path-name is permitted.
More deeply this seems like a leaky abstraction where an implementation detail (the naming scheme in the backend) is leaked to something that the user can control directly. It is an interesting coupling.
There are certain limitations of storage backends that are tricky to work around. One approach could be to have docval validate names when creating new objects (i.e., before we call write) to prevent creation of bad names.
Adding an error check in ZarrIO to improve reporting sounds reasonable. We would need to add a check for both groups and datasets, so we'd probably need a helper function to check if a path-name is permitted.
Make sense.
Thinking about this, this raises other questions: 1) Should we disallow the use of ":" in names for other OSes as well (not just windows)? Currently, you can build a zarr file in Linux that can't be moved to Windows, right? that seems like undesirable. 2) Should it be disallowed for other backends? Otherwise, you can build an NWBFile with hdf5 backend that can't be repacked to zarr.
Thinking about this, this raises other questions:
1. Should we disallow the use of ":" in names for other OSes as well (not just windows)? Currently, you can build a zarr file in Linux that can't be moved to Windows, right? that seems like undesirable. 2. Should it be disallowed for other backends? Otherwise, you can build an NWBFile with hdf5 backend that can't be repacked to zarr.
@rly @bendichter thoughts on this?
I think for 1 yes, we should prohibit ":" on other operating systems, as I would argue cross-OS compatibility is a core feature of NWB.
For 2, I think maybe not. I don't think interoperability between backends is necessarily a core feature, though perhaps we should put this as an NWB Best Practice.
there's another question that comes up for me:
I agree with @bendichter on 1 and 3 and am ambivalent on 2. Allowing but discouraging people from putting ":" in the name feels a little in conflict with preventing people from requiring the name to have a ":". I understand the reason.
A simpler rule is to just not allow ":" in the name for all backends moving forward. I would say most, if not all, people do not care much about the names of their NWB objects.
I think that one advantage of disallowing :
, /
in general is that it would be easier to maintain and communicate. Just a plain rule instead of diffs and code spread across the two backends.
My thoughts on (2): I'm thinking of a use-case where the user wants to write HDF5-based NWB files. They want to use object names that reflect their source data structure in some specific way that includes the :
character. Then we say they can't and they ask why and we say "because it won't work for Zarr." They might be frustrated by that answer because they don't care about using Zarr anyway. I would consider cross-language and cross-OS support to be core features of NWB, so it makes sense to constrain for those, but cross-backend support doesn't seem as essential.
On the other hand, we may down the line decide it's better to store all NWB data on DANDI as Zarr objects, and it might then be much nicer if we already had constraints in place that allowed us to do that more easily.
I see some advantages and disadvantages. I tend to err on the side of freedom, since you can never fully anticipate every use-case and you are potentially causing a big headache for someone by making constraints you don't need to.
This produces an error on windows (see the name of the TimeSeries)
Error trace:
Click here for error trace
{ "name": "KeyError", "message": "'acquisition/Name: name/.zgroup'", "stack": "--------------------------------------------------------------------------- NotADirectoryError Traceback (most recent call last) File c:\\Users\\heberto\\miniconda3\\envs\ euro\\Lib\\site-packages\\zarr\\storage.py:1136, in DirectoryStore.__setitem__(self, key, value) 1135 try: -> 1136 os.makedirs(dir_path) 1137 except OSError as e: FileWhereas this works prefectly: