mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

Add Support for NWB #10427

Open nmarkowitz opened 2 years ago

nmarkowitz commented 2 years ago

Hi,

I've been working on creating a new subclass of BaseRaw that can read from a NWB file. These are organized hdf files that can store multiple types of data streams including those with different sampling rates, sizes from different sources (such as eye-tracking, seeg, scalp eeg, audio etc.). Because they're hdf files they have lazy loading. It would be nice to have a RawNWB object and/or a read_raw_nwb() function so that MNE can work with these types of files. Attached is a link to a gist I made with some example code for reading/writing NWB files (Part 1) and some code snippets I've made to get started with creating a RawNWB object (Part 2). The thing I mostly struggling with is creating the read_segment_file() function. Any insight or help would be greatly appreciated.

https://gist.github.com/sportnoah14/b74f1226c09bd79aa7f7ec7b11af80c9

pynwb docs

welcome[bot] commented 2 years ago

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

rob-luke commented 2 years ago

There is also an extension for NWB-NIRS that would be nice to support if this is implemented. See https://github.com/agencyenterprise/ndx-nirs Also ping @dsleiter

nmarkowitz commented 2 years ago

@rob-luke if that extension is a subclass of a TimeSeries then it should work the same. It would be hard to encompass all the extensions. I think just having a single reader for a TimeSeries container is a great start

dsleiter commented 2 years ago

ndx-nirs stores NIRS data in a class that inherits from TimeSeries, so it should be possible to read the raw data. However, to read channel metadata, I think RawNWB might need to know about specific types of timeseries (eg. ElectricalSeries, SpikeEventSeries, NIRSSeries, etc).

I'm not familiar with the usual implementation details of _read_segment_file for other Raw classes, but all pynwb timeseries objects should have a .data attribute for access to the raw data. https://nwb-schema.readthedocs.io/en/latest/format.html#timeseries

Would the following help in extracting the desired data?

timeseries = nwb.get_acquisition('timeseries_name')
timeseries.data[start:stop+1, idx]

Or was the difficulty with something else about the nwb files?

nmarkowitz commented 2 years ago

@dsleiter I was just meant that support for all TimeSeries subclasses may be difficult. Not impossible though. I've already worked out a way to extract channel information for an ElectricalSeries object. RawNWB may just an up being an object with multiple if statements in its declaration (or a function it calls in its declaration). The biggest issue for this may be if there's significant differences (such as attributes added, deprecated or changed) between versions of NWB. But we can figure that out after getting an initial version running.

dsleiter commented 2 years ago

Yeah, I think that separate logic might be needed for each TimeSeries subclass in order to get channel information. That sounds good to start with ElectricalSeries and add support for others later. There probably aren't too many cases that would need to be implemented in order to support the majority of use cases for mne users.

I believe the NWB team has been pretty good about ensuring backwards compatibility within major versions, so I'd expect the pyNWB 2.0 API to be pretty stable at least until version 3.0. It might be worth adding a check in the code to verify the major version of the installed pynwb library and raise a warning or error if != 2.

Due to NWB's flexibility, other challenges might be handling situations with multiple acquisitions within a single file or ensuring a consistent label for ElectricalSeries channels across different user-created files (looks like you explicitly add a 'label' column, which might not exist for most NWB files).

nmarkowitz commented 2 years ago

@dsleiter I agree, we would probably need separate logic for each case. If it's restricted to the TimeSeries subclasses that are part of primary pynwb package and the major extensions then that should be fine.

Beecause there can be many acquisitions, my thinking is that the RawNWB object would interact with a single TimeSeries as opposed to the whole file. So the function would work more like this:

from pynwb import NWBHDF5IO
io = NWBHDF5IO('myfile.nwb')
nwb = io.read()
raw = RawNWB(nwb.get_acquisition('eeg'))

For other specifications maybe a bunch of kwargs can be added such as ch_names as an argument that points to the column in the ElectrodeTable object that specifies a name for a channel. Otherwise it can simply name channels as "chn1, chn2, chn3" like it would for another TimeSeries object.

Before any of this though we need to get the _read_segment_file() function working to support lazy loading. This is the part that I haven't figured out yet and am still working on. Once this, the most essential, aspect is done then we can go crazy with supporting all the subclasses of TimeSeries

larsoner commented 2 years ago

@sportnoah14 still interested in this?

Before any of this though we need to get the _read_segment_file() function working to support lazy loading. This is the part that I haven't figured out yet and am still working on. Once this, the most essential, aspect is done then we can go crazy with supporting all the subclasses of TimeSeries

FWIW a first version doesn't absolutely have to have this. You can just always preload data (then _read_segment_file does not need to be defined at all). It's not ideal but it's a good start, and might be good enough forever if these files are never expected to be very large...

nmarkowitz commented 2 years ago

@larsoner im still interested in this. I could start with having data preloaded but ideally I, and I think everyone using NWB files, would like to have lazy loading as these files can get very large. I'm currently on holiday and in the midst of moving but I'll be resuming this once I settle a bit more. Hopefully this coming week or the week after.

@dsleiter and @rob-luke I'd love your help with extensions once I get an initial version for preloaded data working for TimeSeries objects. Maybe some of the core NWB developers would be interested in helping too.