hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
46 stars 26 forks source link

[Feature]: Add an object path as a way to uniquely identify an object in the API #1108

Open h-mayorquin opened 5 months ago

h-mayorquin commented 5 months ago

It would be great to have something that can be used to specify an object within the nwbfile that is both unique and independent of the backend. An abstraction that can be used is that of paths so I am imaging an API that could look like this:

electrical_series = nwbfile.get_object_by_path("acquistion/ElectricalSeries")
electrical_series.get_api_path() == "acquistion/ElectricalSeries"

Use cases

In opposition to the object_id that uniquely specifies the object within the NWBFile the location can identify an object in an NWB that remains the same across different sessions. This can be used for:

Previous or Similar Art

This function was implemented in neuroconv:

https://github.com/catalystneuro/neuroconv/blob/47a066ca8c58b88064bfecee90cfcfc70409d135/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L28-L44

And it produces output like this:

acquisition/TestDynamicTable/TestColumn/data
acquisition/NewTimeSeries/data
acquisition/TestElectricalSeries/data

Then the function was ported to pynwb:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/base.py#L290-L324

Complexities

The fact that hdf5 and zarr might have a different paths than the pynwb API can be confusing. An example that @rly pointed out is the electrical series.

Other considerations

I probably missed some subtleties from today's discussion, so I am tagging people here so they can correct my mistake @rly @bendichter @CodyCBakerPhD

bendichter commented 5 months ago

@h-mayorquin I think this was a point of a bit of confusion during the meeting. I believe the way @rly was using the terms is:

nwbfile.acquisition["ElectricalSeries"].data is the "api path"

/acquisition/Electricalseries/data would be something else. Maybe a "hierarchy path"?

h-mayorquin commented 5 months ago

I want to differentiate three things:

  1. The set of code that you use to access something in the API: nwbfile.electrodes
  2. The path the object will have in the backend (as zarr and hdf5 are file-like). (in hdf5 and zarr /general/extracellular_ephys/electrodes)
  3. A unique string that looks like a path that characterizes the object. The natural candidate is `/general/extracellular_ephys/electrodes.

I don't know good terminology to differentiate between them. I think we can use 2 for 3. Right now both zarr and hdf5 have the same "file-like" structure? If so, that would be the simplest thing to do I feel.

oruebel commented 5 months ago

@h-mayorquin can you clarify, what is the difference between 3 and 2, and why is it needed.

h-mayorquin commented 5 months ago

@oruebel I expect that we don't need a distinction between 2 and 3 but ... we might have a backend that does not have a file-structure like hdf5 and zarr? The path of the objects within zarr and hdf5 backends might differ from some objects? I want to emphasize that it should be a backend independent concept hence the distinction.

Does that make sense?

bendichter commented 5 months ago

@h-mayorquin we may have a backend that does not internally use the "/" syntax for Group membership in their Python API, but any backend must enable the HDMF primitives, which means it must have the concept of a Group, so would be mappable to this syntax. Unless there is a good reason not to, I would like to propose we use the HDF5/Zarr path as the unique identifier.

h-mayorquin commented 5 months ago

@bendichter

Unless there is a good reason not to, I would like to propose we use the HDF5/Zarr path as the unique identifier.

Totally agree with that.

oruebel commented 5 months ago

I would like to propose we use the HDF5/Zarr path as the unique identifier.

The real reference here is the mapping to schema. I.e., the path the object will have in the Builder structure. For HDF5/Zarr the path in the file and in Builder hierarchy are identical. All I'm trying to say is, even for non-hierarchical backend stores, we can determine that path from the schema.

h-mayorquin commented 5 months ago

Ah, got you, thanks for the explanation! That's great to hear.