Open alejoe91 opened 4 months ago
from xarray import DataArray
import numpy as np
from pynwb.testing.mock.ecephys import mock_ElectricalSeries
from h5py import Dataset
from pynwb import get_type_map
import json
dset_types = (np.ndarray, Dataset) # etc.
def get_dimension_labels(cls, ndims, dataset_name):
spec = get_type_map().namespace_catalog.get_spec(cls.namespace, cls.neurodata_type)
data = next(x for x in spec["datasets"] if x["name"] == dset_name)
dims = data["dims"]
if isinstance(dims[0], str): # only one shape spec
return dims
for i_dims in dims:
if len(i_dims) == ndims:
return i_dims
def load_dset_as_xarray(obj, dset_name):
dset = obj.fields[dset_name]
cls = obj.__class__
ndims = len(dset.shape)
dim_labels = get_dimension_labels(cls, ndims, dset_name)
coords = dict(num_channels=electrical_series.electrodes.data)
if obj.timestamps is not None:
coords.update(num_times=obj.timestamps)
attrs = {k: v for k, v in obj.fields.items() if not isinstance(v, dset_types)}
return DataArray(dset, dims=dim_labels, coords=coords, attrs=attrs)
electrical_series = mock_ElectricalSeries(timestamps=np.arange(10), rate=None)
load_dset_as_xarray(electrical_series, "data")
^ Code that solves a related problem and might be helpful
^ Code that solves a related problem and might be helpful
I think this could be useful to have in some form available in PyNWB. Maybe as a utility method and/or as a method on TimeSeries
, since a common use-case for this is probably representing TimeSeries.data
as xarray
.
It would be great to add these attributes as default for known data types (e.g.
ElectricalSeries
)
Adding the _ARRAY_DIMENSIONS
attribute for cases where we know the dimensions seems like a good idea 👍
In terms of implementation, I think this will require changes in HDMF as well. Here a rough plan of how this could be implemented:
dimension_labels
as an attribute on the DatasetBuilder
(which may be None of the labels are unknown) https://github.com/hdmf-dev/hdmf/blob/5c8506216995f995b891da1e6b596ee42b7dd948/src/hdmf/build/builders.py#L321BuildManger.build
to set the dimension_labels
for DatasetBuilders
https://github.com/hdmf-dev/hdmf/blob/5c8506216995f995b891da1e6b596ee42b7dd948/src/hdmf/build/manager.py#L148ZarrIO.write_dataset
to add the _ARRAY_DIMENSIONS
to the attributes of the dataset if builder.dimension_labels
are present. @rly does that plan sound reasonable or what this also require changes in the ObjectMapper
to determine the dimensions there instead of in the BuildManager?
@rly does that plan sound reasonable or what this also require changes in the ObjectMapper to determine the dimensions there instead of in the BuildManager?
That sounds reasonable, except that all the building / creation of DatasetBuilder
objects happen in the ObjectMapper
. We'd probably do it in __add_attributes
or the constructor.
@mavaylon1 could you take a look at this?
I'll work on the HDMF side
Can do
What would you like to see added to HDMF-ZARR?
Xarray supports the Zarr backend, but requires the
_ARRAY_DIMENSIONS
attribute to be set with a list of names for the array dimensions (e.g.[samples, channels]
) - see https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html#zarr-encodingIt would be great to add these attributes as default for known data types (e.g.
ElectricalSeries
)@jsiegle
Is your feature request related to a problem?
NWB-Zarr files cannot be opened by
xarray.open_zarr
What solution would you like?
Adding the
_ARRAY_DIMENSIONS
attributes to all "known" neurodata_typesDo you have any interest in helping implement the feature?
Yes.
Code of Conduct