hdmf-dev / hdmf-zarr

Zarr I/O backend for HDMF
https://hdmf-zarr.readthedocs.io/
Other
7 stars 7 forks source link

Zarr datasets info lack compression data #186

Open h-mayorquin opened 3 months ago

h-mayorquin commented 3 months ago

So this:

import zarr
from numcodecs import Blosc

# Create a Zarr array
data = zarr.zeros((1000, 1000), chunks=(10, 10), dtype='float32')

# Set compression options
compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.SHUFFLE)

# Create a DirectoryStore
store = zarr.DirectoryStore("./zarr_test.zarr", "w")

# Create a Zarr group and store the array
group = zarr.group(store)
group.create_dataset('data', data=data, compressor=compressor)

group_reloaded = zarr.open(path, mode='r')
group_reloaded["data"].info

Contains compression data (see the end of the image):

image

But if I crate data through the package and then re-read it:

from numcodecs import Blosc
from hdmf_zarr import ZarrDataIO
import numpy as np
from pynwb.testing.mock.file import mock_NWBFile
from hdmf_zarr.nwb import NWBZarrIO
import os
from numcodecs import Blosc, Delta

from pynwb.testing.mock.ecephys import mock_ElectricalSeries
filters = [Delta(dtype="i4")]

data_with_zarr_data_io = ZarrDataIO(
    data=np.arange(100000000, dtype='i4').reshape(10000, 10000),
    chunks=(1000, 1000),
    compressor=Blosc(cname='zstd', clevel=3, shuffle=Blosc.SHUFFLE),
    # filters=filters,
)

timestamps = np.arange(10000)

data = data_with_zarr_data_io

nwbfile = mock_NWBFile()
electrical_series_name = "ElectricalSeries"
rate = None
electrical_series = mock_ElectricalSeries(name=electrical_series_name, data=data, nwbfile=nwbfile, timestamps=timestamps, rate=None)

path = "zarr_test.nwb.zarr"
absolute_path = os.path.abspath(path)
with NWBZarrIO(path=path, mode="w") as io:
    io.write(nwbfile)

from hdmf_zarr.nwb import NWBZarrIO

path = "zarr_test.nwb.zarr"

io = NWBZarrIO(path=path, mode="r")
nwbfile = io.read()
nwbfile

electrical_series_name = "ElectricalSeries"
electrical_series = nwbfile.acquisition[electrical_series_name]
electrical_series.data.info

Then that type of information is somehow not available:

image

I have no idea why this is the case.

mavaylon1 commented 3 months ago

@h-mayorquin I'm looking at the pictures and can't see the issue. The type for both are present in the images.

h-mayorquin commented 3 months ago

Sorry @mavaylon1 , I was not precise enough by saying "info". What is lacking is the "Storeage Ratio" and the "No. Bytes Stored". Both are present when I saved with zarr but not when I saved through hdmf-zarr.

mavaylon1 commented 2 months ago

I see Do you want to take this on since it is related to improving html representation of datasets? I could look into it, but it won't be till sometime next week.