magland / sortingview

Web app for viewing results of ephys spike sorting
Apache License 2.0
23 stars 8 forks source link

Info / requests for phy-like features #233

Open jonahpearl opened 1 month ago

jonahpearl commented 1 month ago

Hi Jeremy — this is a great project, and I really appreciate the close integration with spikeinterface, which I've been trying out for a little bit now. I do most of my spike sorting on a computing cluster, so using sortingview instead of phy would be a big time saver! I played with it a bit this morning, and have a few questions regarding some phy-like features that I was looking for but didn't find. I'm wondering if 1) I've missed them, 2) there's a way to get them but the default SI implementation doesn't have them, or 3) they aren't in the app yet. If 3), then consider this a feature request to make sortingview a fully functioning replacement for phy :)

(In order of subjectively determined importance...)

So to summarize, if I could 1) quickly view the waveform templates for each unit I selected, 2) work faster with keyboard shortcuts, and 3) split units via amplitude / PC view, I would consider this a working replacement for Phy. What with copilot being so good these days, I'm tempted to try building some of these myself, but I'd like to know if they're in progress / have been deemed impossible due to technical quirks / etc.

Thank you!


image image image
magland commented 1 month ago

Hi @jonahpearl thanks for reaching out with the detailed feature requests! I had no plans of splitting clusters, but maybe you'll convince me, who knows.

I'm mostly unavailable for the rest of the week, but maybe we can have a zoom call next week. Feel free to reach out by email if you can find it. :)

alejoe91 commented 1 month ago

I'll copy my comment here:

@jonahpearl let me chime in here, since we discussed about this several times.

Sortingview works by pushing data to the cloud. To do so and make things efficient (and cheap), we have to minimize the amount of data used for visualization. As an example, the amplitude scatterplots use a decimated version of the amplitudes and of the spike times (e.g., 1 out of 10). In order to split, you need the full array of all the data you're plotting, because the "split" will need the indices of every spike that belong to each splitted cluster. Another example is that you don't have a Waveform view, but just a templates view!

So I think it could be an option to add, but in that case we should expect a clear slowdown in performance and increase in storage costs.

In addition, we recently added a curation format that we will need to extend for splits. I think we should do this anyways because we will implement splitting in the SpikeInterface-GUI, which works directly off the sorting analyzer.

jonahpearl commented 1 month ago

Thanks AB! Those limitations make sense. I'm also sympathetic to not wanting to develop two different curation GUIs in parallel (ie SpikeInterface-GUI and sortingview), and to the fact that probably more people are doing things on a local desktop and will be fine with a local app — though I think there's nothing keeping local users from using a well-designed web-app.

With the caveat that I know very little about web development, those limitations don't seem fatal. I see that sortingview is powered by kachery / a "figure sharing" aesthetic, wherein you can immortalize views of your data to be shared with people, and that makes sense to upload small + permanent datasets. For spike curation, though, why not either 1) run a local server more akin to Jupyter, 2) upload more data to the cloud, with the caveat that it might take a few minutes, and impose a deletion timer, e.g. 7 days, after which the user would need to re-upload the data, or 3) ask users to set up their own storage instead of using whatever the current default is. Given the bugs people are willing to wade through with Phy, I suspect they'd be willing to do some set up for a bug-free version :)

Re performance, maybe there could be an option about what fraction of the data to be showing, and the trade-off would be performance vs. completeness. Then you could, e.g. have an idea for a split, double-check it by looking at the full dataset (or at least more of it), do it, and then revert back.