napari / napari

napari: a fast, interactive, multi-dimensional image viewer for python
https://napari.org
BSD 3-Clause "New" or "Revised" License
2.07k stars 410 forks source link

Can't open zarr contains which contains any other than the expected files #6850

Closed imagejan closed 2 weeks ago

imagejan commented 1 month ago

🐛 Bug Report

Originally reported by @tibuch here: https://github.com/ome/napari-ome-zarr/issues/106

💡 Steps to Reproduce

I have a zarr container with the following structure:

container.zarr
    - 0
    - .zattrs
    - .zgroup

I can drag and drop this on napari and the data is opened.

However, if I add just another file like:

container.zarr
    - 0
    - test.txt
    - .zattrs
    - .zgroup

It fails with:

ValueError: Not a zarr dataset or group: /path/to/container.zarr/test.txt

In my opinion the plugin should only look at the files which are defined by the spec and ignore any other files.

💡 Expected Behavior

No response

🌎 Environment

napari: 0.4.19

💡 Additional Context

No response

psobolewskiPhD commented 1 month ago

Thanks for reporting. I agree with the suggested expected behavior. There's a couple builtins issues, looks like that part of napari needs a closer look.

DragaDoncila commented 1 month ago

The problem code is here. It tries to read everything in the directory that doesn't start with a .. We should either try..except this and only raise later if the image is empty, or find a better way to identify the arrays than paths that don't start with a .. @jni thoughts?

jni commented 4 weeks ago

thoughts?

I actually have no memory of supporting multi-arrays at all, that's neat! 😜

I think we can be a little more proactive in the iteration — we should only iterate directories in that list comprehension. So:

for subpath is sorted(os.listdir(path))
if not subpath.startswith('.')
and os.path.isdir(path)
and (os.path.exists(os.path.join(path, '.zarray'))
     or os.path.exists(os.path.join(path, '.zgroup')))

At this point I think wrap path in a pathlib.Path to make that whole thing less of a mouthful, but you get my drift. Basically, only recurse subdirectories and only if they are actually zarr arrays or nested groups.

DragaDoncila commented 4 weeks ago

Great well, that should be an easy PR!

@imagejan if you are at all interested in contributing this fix yourself, let us know and we can guide you through it. Otherwise one of the core team will get to it as soon as we can!

d-v-b commented 3 weeks ago

Is there any reason to not use the zarr-python api for listing groups and arrays here?

In zarr v3, .zarray and .zgroup are gone, and so this code will need to handle the names of the new metadata document (zarr.json). But if instead this code uses zarr-python to abstract over the names of the metadata documents, things are a lot simpler for napari, IMO

jni commented 3 weeks ago

Is there any reason to not use the zarr-python api for listing groups and arrays here?

ignorance! 😂 Since @imagejan already opened #6857 which is strictly better than the current situation, I suggest we merge that, then iterate. Thanks for the suggestion @d-v-b, that's indeed the right path forward.

d-v-b commented 3 weeks ago

for reference, in zarr-python right now, contains_group and contains_array both do what you would expect for checking if a path denotes a group or array.

Once you have a handle to a group, you can iterate over its sub-groups / sub-arrays with Group.items() (because Group is a MutableMapping, or just sub-arrays with Group.arrays, or just sub-groups with Group.groups.

Note that some of this will change in v3, but not too drastically -- e.g., we might make contains_array / contains_group take a zarr_format keyword argument, since zarr v3 and zarr v2 arrays can co-exist in the same directory.