saalfeldlab / n5-zarr

Zarr filesystem backend for N5
BSD 2-Clause "Simplified" License
12 stars 14 forks source link

Return null in getZarrayAttributes if array attributes are not available #13

Closed hanslovsky closed 1 year ago

hanslovsky commented 1 year ago

Currently, calling N5ZarrReader.getDatasetAttributes(notADataset) is called for a path that is not a dataset throws a NullPointerException:

java.lang.NullPointerException: Cannot invoke "com.google.gson.JsonElement.getAsInt()" because the return value of "java.util.HashMap.get(Object)" is null

        at org.janelia.saalfeldlab.n5.zarr.N5ZarrReader.getZArrayAttributes(N5ZarrReader.java:256)
        at org.janelia.saalfeldlab.n5.zarr.N5ZarrReader.getDatasetAttributes(N5ZarrReader.java:270)
        at org.janelia.saalfeldlab.n5.zarr.N5ZarrTest.testGetDatasetAttributesNull(N5ZarrTest.java:162)

This PR addresses this:

  1. Return null if not/a/dataset/path/.zarray does not exist (Previously, it would only log System.out.println(path.toString() + " does not exist.");)
  2. Fix typo getZArraryAttributes -> getZArrayAttributes

I can revert (2) if you want to preserve public API

mkitti commented 1 year ago

Is a missing .zarray valid Zarr? Perhaps this should throw an IOException rather return null which kicks the NPE down the road.

My reading of the specification js that .zarray has values which MUST be present. https://zarr.readthedocs.io/en/stable/spec/v2.html#arrays

@d-v-b what does zarr-python do in this case?

hanslovsky commented 1 year ago

Yes, groups don't need a .zarray, they have a .zgroup:

In [1]: import zarr

In [2]: g = zarr.open_group('.', 'a')

In [3]: g.create_dataset('ds', data=[123])
Out[3]: <zarr.core.Array '/ds' (1,) int64>

In [4]: g.create_group('g')
Out[4]: <zarr.hierarchy.Group '/g'>

In [5]: !tree -a .
.
├── ds
│   ├── 0
│   └── .zarray
├── g
│   └── .zgroup
└── .zgroup

3 directories, 4 files

[getDatasetAttributes and datasaetExists] (https://github.com/saalfeldlab/n5-zarr/blob/3538c09dc858df647e939073f4fc48ad0c4608ad/src/main/java/org/janelia/saalfeldlab/n5/zarr/N5ZarrReader.java#L267-L279) also expect null return values, assuming that means that .zarray does not have valid dataset attributes.

Note:

I will also check my code to see if I called getDatasetAttributes without checking for datasetExists first.

hanslovsky commented 1 year ago

Here is a minimum working example using kscript that shows the error, without ever calling getDatasetAttributes explicitly:

@file:Repository("https://maven.scijava.org/content/groups/public")
@file:DependsOn("org.janelia.saalfeldlab:n5-zarr:0.0.7")

import org.janelia.saalfeldlab.n5.N5Reader
import org.janelia.saalfeldlab.n5.zarr.N5ZarrWriter

class ReaderDelegate(delegate: N5Reader): N5Reader by delegate

val zarr = N5ZarrWriter("npe.zarr")

val pathName = "not-a-group"
zarr.createGroup(pathName)

println("isDataset? ${ReaderDelegate(zarr).datasetExists(pathName)}")

ReaderDelegate.datasetExists uses the default implementation, which relies on receiving null if DatasetAttributes do not exist at pathName, without checking for .zarray. This could easily be fixed in the example by explicitly overwriting datasetExists to use delegate.datasetExists but overwriting the default implementation should not be required, unless functionally different.

bogovicj commented 1 year ago

Thanks @hanslovsky ,

It's worth it to fix the typo imo, some more big changes are coming