holoviz-topics / neuro

HoloViz+Bokeh for Neuroscience
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

GOAL: Large-data-handling #32

Closed droumis closed 4 months ago

droumis commented 1 year ago

UPDATE: This initiative has been superseded by other, more targeted efforts

Summary and Links

Note.. each domain section below starts with some important 'Context'.

Task Planning:

Electrophysiology (Ephys)

**Context:** While the continuous, raw data is streamed and viewed during data acquisition, it's not that critical to look at the full-band 30KHz version during processing/analysis. Instead, the raw-ish displays are of the low-pass filtered (<1000 Hz) continuous data (like a filtered version of the [ephys viewer workflow](https://github.com/holoviz-topics/neuro/blob/main/workflows/ephys-viewer/workflow_ephys-viewer.ipynb)), stacked 'spike raster' of action potential events (see [spike raster workflow](https://github.com/holoviz-topics/neuro/blob/main/workflows/spike-raster/workflow_spike-raster.ipynb)), and a view of the spike waveforms (see [waveform workflow](https://github.com/holoviz-topics/neuro/blob/main/workflows/waveform/workflow_waveform.ipynb)). These three workflows represent different challenges to large data handling and may require specific approaches. Additionally, although there is a lot of heterogeneity in technique and equipment in electrophysiology, below we are focusing on the Allen Institute data is advantageous because they have a well-funded group maintaining their [sdk](https://github.com/alleninstitute/allensdk/), they utilize Neuropixel probes which are relatively high channel-count (and therefore represent a more difficult use case), and their data are available via the NWB 2.0 file format (fancy HDF5) which is becoming increasingly common in neuroscience. Demetris has some contacts with the Allen institute but we haven't yet engaged with them for feedback/collaboration; but this will happen once we have something to show them that is demonstrably better than their current approach. Additionally, we are collaborating with one of Jim's former colleagues, who works primarily with relatively smaller spike-time datasets (some real, some synthetic) and is primarily interested in spike-raster-type workflows, so the work below will benefit his group as well even though we will focus on Allen Institute data.

Ephys Phase 1: Understanding the ecosystem, problems, and foundations for the solution

Ephys Phase 2: Building an MVP

Ephys Phase 3: Benchmarking the MVP

Ephys Phase 4: Advanced Visualization Techniques

Ephys Phase 5: Minimap/multi-scale

PROBABLY SKIP THIS: Ephys Phase 6: Exploring Direct HDF5 Access with Kerchunk

Ephys Phase 7: Adapt Progress to Waveform Workflow


1-Photon Calcium Imaging (1P-Imaging)

Primarily regarding the Miniscope device and associated Minian software

**Context:** The Minian work so far uses many SOSA tools (zarr, dask, xarray, holoviews, panel, bokeh, etc) which is great and we want to help improve their pipeline, especially since there are parts (like the CNMF app) that are reportedly unusable with large data. If we could make their pipeline streamlined, that would be a massive win for everyone. However, Demetris is trying to engage with the primary developer of Minian to see if they would consider accepting PR's (the project hasn't been updated since June 2022, old versions of most packages are pinned, and it doesn't have a build for osx_arm64), or else we'd need to find a solution that has visibility in the community, which gets more complicated. The developer is now working with a company called Metacell which is facilitating imaging analysis platforms, so this could either be an opportunity for accelerated adoption or something less good if we can't improve things and show that a bokeh-based workflow is the best approach. There is also some potentially competing/complementary solutions in the works from the fastplotlib folks, and they already have a collab going with the popular 2-Photon analysis suite 'CaImAn', which could potentially absorb 1-Photon workflows in the future (unless our solution and community support is demonstrably better).

1P-Imaging Phase 1: Understanding the ecosystem, problems, and foundations for the solution

1P-Imaging Phase 2: Building from the existing MVP

1P-Imaging Phase 3: Benchmarking the improvements


EEG

Primarily regarding the MNE software

**Context:** The MNE software is well-maintained, documented, and widespread. We have established a friendly collaboration with one of their developers, and a successful end result is a HoloViz/Bokeh approach to EEG visualization that they advertise to their users. The extent of actual integration into the MNE software is yet to be determined, but one ('best') possible situation is that the HoloViz/Bokeh backend is shipped with their package so users can easily switch to it with an argument. The next best possible situation is that they advertise the HoloViz/Bokeh approach in some way, but it remains outside of their package. Either way, we want to fashion our solution such that it would be possible to integrate and complement their tooling. This has implications for the data-access approach, as we want to try to utilize their existing data readers and formats as much as possible. In the future, a possible grant extension could work with MNE developers to adopt a data-access approach that uses zarr, dask, xarray, etc if there was some hints that this approach would be more promising.

EEG Phase 1: Understanding the ecosystem, problems, and foundations for the solution

EEG Phase 2: Benchmark the eeg viewer workflow version that uses MNE I/O

EEG Phase 3: Advanced Visualization Techniques (common to Ephys)

EEG Phase 4: Minimap/multi-scale (common to Ephys)

droumis commented 1 year ago

I'm creating a separate benchmarking goal issue.