hdmf-dev / hdmf-zarr

Zarr I/O backend for HDMF
https://hdmf-zarr.readthedocs.io/
Other
7 stars 6 forks source link

Update for zarr 2.18.0 and zarr 2.18.1 #195

Closed mavaylon1 closed 6 months ago

mavaylon1 commented 6 months ago

Motivation

What was the reasoning behind this change? Please explain the changes briefly. This PR is to update hdmf-zarr for changes that came n 2.18.0 and new changes in 2.18.0

2.18.0 This is in response to the failing workflows using the latest version of zarr, specifically the failing test test_fsspec_streaming.

The issue is how we are reading zarr scalar arrays. We are currently trying to directly index and access the scalar. From looking at the documentation, zarr accessed a scalar array as z[:], which seems to solve the problem. I also saw that z[()] was another syntax that works.

Why? Not completely clear. When we access a scalar array in numpy we do so with notation [()]. Supposedly, Zarr used to support indexing scalar arrays as z[0], but that was updated to numpy standard earlier in one of the zarr version 2.#. Maybe this was a functionality that finally got flushed out fully.

2.18.1 To ensure that the data being assigned is in a format that the Zarr array can handle efficiently, Zarr arrays seem to require that we set with numpy arrays, i.e., dset[:] = np.array(data).

Another weird behavior is when setting a dataset of references, the current method puts the entire dataset in each index vs matching the individual reference dictionary to the index (which is what we expect).

The solution to both is make sure the data is in an array first.

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

codecov-commenter commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 86.05%. Comparing base (72ff80f) to head (e9cfb84).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #195 +/- ## ======================================= Coverage 86.05% 86.05% ======================================= Files 5 5 Lines 1162 1162 Branches 287 287 ======================================= Hits 1000 1000 Misses 107 107 Partials 55 55 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mavaylon1 commented 6 months ago

Hey @oruebel can you give me your thoughts. I believe this is related to #192

oruebel commented 6 months ago

Using the [()] syntax to access scalars makes sense to me

mavaylon1 commented 6 months ago

@oruebel There are also some deprecation warnings for zarr 3.0. I will add filters in this PR, but I'll also make a ticket for those deprecations to be resolved to whatever zarr will be pivoting to.

mavaylon1 commented 6 months ago

Review Notes: I need to ping to see if this now resolved with this update or to see what changes: https://github.com/hdmf-dev/hdmf-zarr/issues/192