talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
422 stars 98 forks source link

Enhancing NWB support #853

Closed talmo closed 2 years ago

talmo commented 2 years ago

We should support the features mentioned by @bendichter in our NWB I/O.

  1. Make the adapter more flexible. https://github.com/talmolab/sleap/blob/d522f1d9df1f65d0911fa07e8b2636b182d03c0c/sleap/io/format/ndx_pose.py#L167

    This should accept a couple more things to look like:

    write(
        self,
        filename: str,
        overwrite: bool = False,
        session_description: str = "sleap",
        identifier: Optional[str] = None,  # if None, use a GUID like str(uuid.uuid4())
        session_start_time: Optional[datetime.datetime] = None,  # if None, use datetime.datetime.now()
    ):
        if session_start_time is None:
            session_start_time = datetime.datetime.now(datetime.timezone.utc)
        if identifier is None:
            identifier = uuid.uuid4()
        if Path(filename).exists() and not overwrite:
            with NWBHDF5IO(filename, mode="r") as io:
                nwb_file = io.read()
        else:
                nwb_file = NWBFile(
                    session_description=session_description,
                    identifier=identifier,
                    session_start_time=session_start_time,
            )
        # ...
        with NWBHDF5IO(path, mode="w" if overwrite else "a") as io:
            io.write(nwb_file)
  2. Create a high-level export API as a instance-level method on Labels with signature:

    def export_nwb(self, filename: str, overwrite: bool = False, session_description: str = "sleap", identifier: Optional[str] = None, session_start_time: Optional[datetime.datetime] = None):
        # Call the adapter to write self

Discussed in https://github.com/talmolab/sleap/discussions/851

Originally posted by **bendichter** July 21, 2022 I see you have made a lot of progress in supporting NWB lately! It looks like you are using the ndx-pose extension as intended. A few points I want to bring up 1. Add append feature Currently, a user is not able to set required metadata such as the `session_start_time`, `identifier`, and `session_description`. Instead, placeholder values are used here: https://github.com/talmolab/sleap/blob/d522f1d9df1f65d0911fa07e8b2636b182d03c0c/sleap/io/format/ndx_pose.py#L218-L222 with no way to overwrite them. This creates NWB files that are valid, but contains incorrect data. There are two ways to solve this. The first is to allow a user to pass in this metadata to the `write` function. The second would be to allow a user to create an NWB file on their own with the proper metadata and then append the sleap data to that file. The second approach, allowing a user to append to an existing NWB file, also solves a different problem. NWB files are meant to store all of the streams of data in a single file, e.g. behavior + optical physiology or electrophysiology. If I wanted to put sleap and ephys data in an NWB file, since this is currently written without an append option, I'd be forced to first write the sleap data and then append the ephys data. This is technically doable but becomes a problem if any other data writer or converter has the same limitation of not being able to append. Because of this, it is best practice to allow for appending to an existing NWB file. 4. Adding sleap to NWB software documentation We keep track of software that can read and write NWB [here](https://nwb-overview.readthedocs.io/en/latest/tools/tools_home.html). Would you mind submitting a PR [here](https://github.com/NeurodataWithoutBorders/nwb-overview/tree/main/docs/source/tools) next to the entry for DeepLabCut so our users know that sleap is part of the NWB ecosystem? 5. NWB Conversion Tools We are developing [NWB Conversion Tools](https://nwb-conversion-tools.readthedocs.io/en/main/index.html) which provide automated conversions from a large number of proprietary formats. It would be great if we could use this library to convert data that has already been written to NWB. It seems like your adapter system with read and write for several formats could be helpful. Does sleap have a standard output format?
CodyCBakerPhD commented 2 years ago

Sounds great, and it was nice meeting you all today!

I just wanted to mention some technical details about operating on an NWBFile in append mode that we've learned from many past experiences...

The approach of

    if Path(filename).exists() and not overwrite:
        with NWBHDF5IO(filename, mode="r") as io:
            nwb_file = io.read()
    else:
            nwb_file = NWBFile(
                session_description=session_description,
                identifier=identifier,
                session_start_time=session_start_time,
        )
    # ...
    with NWBHDF5IO(path, mode="w" if overwrite else "a") as io:
        io.write(nwb_file)

when Path(filename).exists() = True and overwrite = False, the modified nwb_file object after the # ... stages will be pointing to the now-closed io from when it was first read from with NWBHDF5IO(filename, mode="r") as io: and this could cause issues when you finally hit the final io.write in the new io from with NWBHDF5IO(path, mode="w" if overwrite else "a") as io: in the final step.

Instead, the way we've successfully performed appending is to do so from within the respective context that that nwb_file object that was read.

Here are some code snippets that have evolved over the years for this purpose...

The oldest, probably least fancy way: https://github.com/catalystneuro/neuroconv/blob/cc1c2fb9b175eeabe026066901d2a8344585343a/src/nwb_conversion_tools/nwbconverter.py#L168-L181

which is basically what you have now except everything is combined into a single context and the # ... steps are performed inside that context.

The newer/fancier way we have is through our own custom designed context for making the 'append' vs. 'make a fresh NWBFile' decision behind the scenes: https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/nwbconverter.py#L173-L184 + https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers.py#L118-L177

Of course another alternative is to simply not use a context at all and just remember to call io.close() either at the end of the export or in an except clause wrapper in the event that there was an error anywhere in the process. I personally prefer contexts for maximum safety, though.

Oh, and another design principle I'd strongly suggest is making whatever the steps are within that # ... block an external function like add_content_to_nwbfile_object(nwbfile: pynwb.NWBFile, ...) so that all the SLEAP/ndx-pose details can be added to an in-memory NWBFile object separate from all the I/O handling. Such a helper function is probably what we would integrate directly with the corresponding DataInterface in NeuroConv since we typically like to handle the integration of multiple data streams all in a single io.write() call at the end of our process.

Anyway, looking forward to meeting some of you at Janelia next week! Cheers, Cody

CodyCBakerPhD commented 2 years ago

Yet another detail that I just recalled - any call to NWBHDF5IO(filename, mode="r") will also want to include the keyword argument load_namespaces=True otherwise it might fail to open an existing file that has extensions used within it.

talmo commented 2 years ago

Excellent, thanks for the pointers @CodyCBakerPhD!! We'll be mindful of how we work with the IO handles.

talmo commented 2 years ago

Added in v1.2.5.