kwikteam / phy-contrib

[This repository is archived, will be deprecated after the release of phy 2.0]
30 stars 39 forks source link

[documentation] clarification on proper way to extract waveform of cluster #138

Closed cxrodgers closed 5 years ago

cxrodgers commented 6 years ago

I would like to classify sorted units into narrow-spiking and broad-spiking and want to make sure I'm extracting the cluster's scaled "waveform" properly. This issue is covered under the FAQ in this page (https://github.com/kwikteam/phy-contrib/blob/master/docs/template-gui.md) but I think there is a simpler way than what is in the FAQ and I want to make sure this simpler way is correct.

First off, I understand that option #1 is to take the mean of the filtered data around every spike occurrence, but I'd rather work directly with the templates, because if two clusters fire very synchronously this would affect the mean but not the tempaltes.

Option #2 is to unwhiten and scale the templates. The FAQ links to this Matlab snippet (https://github.com/cortex-lab/spikes/blob/master/analysis/findTempForEachClu.m) but I found this comment (https://github.com/cortex-lab/KiloSort/issues/35#issuecomment-262824645) from Marius which suggests an easier way is simply to load "Wraw" directly from "rez.mat". Then of course I would multiply by my raw data int16 scaling factor (0.195uV).

I would just like to know if this Wraw method is indeed a recommended way to extract "the waveform of the cluster" in the sense that would be appropriate for waveform classification. If it is, I might recommend changing the FAQ at https://github.com/kwikteam/phy-contrib/blob/master/docs/template-gui.md to reference this easier way.

Thanks!!

cxrodgers commented 6 years ago

Oh, and I forgot to mention a related, but slightly different issue. If a cluster has been manually reclustered and now includes multiple templates, the FAQ recommends taking the template with the most spikes. Instead, I'm thinking I will take the average of all included templates, each weighted by the proportion of spikes assigned to it. Does it sound reasonable?

nsteinme commented 6 years ago

Hi Chris - your understanding about Wraw is correct, I believe. We haven't emphasized using that method just because "rez.mat" is completely undocumented whereas we have carefully documented the phy files at the link you pointed out. The intent is for the "phy" npy files to be the user-facing part. So a better solution might be updating rezToPhy to also output the Wraw into a file with a sensible name.

Your idea to average template waveforms in proportion to included spikes sounds good - picking the one with the greatest count is certainly an approximation but, well, if your assignment to narrow/broad depends on this then perhaps it is not such a reliable assignment :)

Sorry to take so long to respond...

cxrodgers commented 6 years ago

Thanks Nick! After more thought I realized the advantage of using your findTempFromEachClu over rez.mat (in addition to the documentation and interface-related advantages you just mentioned). I think that rez.mat just stores the dewhitened and scaled templates, whereas your snippet allows the user to reconstruct the actual dewhitened and scaled version of every single spike. The main use case for this is something we spoke about offline, which is calculating the actual amplitude of every spike over the course of the session and checking for drift. As you mentioned the amplitudes from KiloSort are not comparable across clusters so it's necessary to dewhiten and scale each spike before doing this.