Open h-mayorquin opened 5 months ago
@h-mayorquin I think this was a point of a bit of confusion during the meeting. I believe the way @rly was using the terms is:
nwbfile.acquisition["ElectricalSeries"].data
is the "api path"
/acquisition/Electricalseries/data
would be something else. Maybe a "hierarchy path"?
I want to differentiate three things:
nwbfile.electrodes
/general/extracellular_ephys/electrodes
)`/general/extracellular_ephys/electrodes
.I don't know good terminology to differentiate between them. I think we can use 2 for 3. Right now both zarr and hdf5 have the same "file-like" structure? If so, that would be the simplest thing to do I feel.
@h-mayorquin can you clarify, what is the difference between 3 and 2, and why is it needed.
@oruebel I expect that we don't need a distinction between 2 and 3 but ... we might have a backend that does not have a file-structure like hdf5 and zarr? The path of the objects within zarr and hdf5 backends might differ from some objects? I want to emphasize that it should be a backend independent concept hence the distinction.
Does that make sense?
@h-mayorquin we may have a backend that does not internally use the "/" syntax for Group membership in their Python API, but any backend must enable the HDMF primitives, which means it must have the concept of a Group, so would be mappable to this syntax. Unless there is a good reason not to, I would like to propose we use the HDF5/Zarr path as the unique identifier.
@bendichter
Unless there is a good reason not to, I would like to propose we use the HDF5/Zarr path as the unique identifier.
Totally agree with that.
I would like to propose we use the HDF5/Zarr path as the unique identifier.
The real reference here is the mapping to schema. I.e., the path the object will have in the Builder structure. For HDF5/Zarr the path in the file and in Builder hierarchy are identical. All I'm trying to say is, even for non-hierarchical backend stores, we can determine that path from the schema.
Ah, got you, thanks for the explanation! That's great to hear.
It would be great to have something that can be used to specify an object within the nwbfile that is both unique and independent of the backend. An abstraction that can be used is that of paths so I am imaging an API that could look like this:
Use cases
In opposition to the
object_id
that uniquely specifies the object within the NWBFile the location can identify an object in an NWB that remains the same across different sessions. This can be used for:Previous or Similar Art
This function was implemented in neuroconv:
https://github.com/catalystneuro/neuroconv/blob/47a066ca8c58b88064bfecee90cfcfc70409d135/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L28-L44
And it produces output like this:
Then the function was ported to pynwb:
https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/base.py#L290-L324
Complexities
The fact that hdf5 and zarr might have a different paths than the pynwb API can be confusing. An example that @rly pointed out is the electrical series.
Other considerations
I probably missed some subtleties from today's discussion, so I am tagging people here so they can correct my mistake @rly @bendichter @CodyCBakerPhD