narendramukherjee / blech_clust

GNU General Public License v3.0
5 stars 15 forks source link

Remove linear interpolation for dejittering? #15

Open abuzarmahmood opened 3 years ago

abuzarmahmood commented 3 years ago

I just realized that the interpolation used for dejittering is linear. Wouldn't that mean that it can't EXTEND the waveform at it's lowest/highest point? I've attempted replicating this issue by trying to interpolate abs(x) with some missing values around 0. What do you guys think? If this is the case and blech_clust has been working fine so far, we can take out the interpolation, which will speed up downstream processing and save significantly on storage space and I/O.

linear_interpolation_issue

narendramukherjee commented 3 years ago

Is it possible to try this on the spike waveforms themselves and see how it looks? They might have less "missing data" than the abs(x) case you show above, and might be doing a good job despite being linear. I think that is the case as what interpolation is doing is actually filling in data points all along the curve, and not trying to fill a gaping hole in the curve at a specific point.

That being said, I would suggest being very cautious before taking out interpolation: reason being that I have a vague memory of doing a deep dive briefly to understand what interpolation was doing many years ago. What I found is that the interpolation-related condition (i.e, the minima of the interpolated spike waveform should not be more than 3 time steps away from the minima of the un-interpolated waveform) got rid of all the bad, wonky spikes esp those that had more than 1 trough. It was a very useful "noise-cancelling" step I felt. But that was many years ago and I would totally encourage you to investigate :D

narendramukherjee commented 3 years ago

You can also change the kind parameter here to "quadratic" to get a more "spliny" sort of interpolation, don't remember what we were using: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html

abuzarmahmood commented 3 years ago

Cool. I figured I might be oversimplifying the situation. I'll try to see what it does to actual spikes, and whether removing it breaks something or changing the interpolation type makes things better. Will keep you posted. Thanks :D

abuzarmahmood commented 3 years ago

Just a quick update on this front. I tried sorting with and without interpolation and didn't find a difference. There was also a significant difference in time (I ran it twice, same results both times): No interpolation real 9m22.740s user 50m26.992s sys 11m9.336s

Interpolation real 17m18.859s user 77m48.799s sys 28m18.176s

I'll keep testing it on my copy of the code for now and see if anything breaks. Can you think of a specific test I should try? Like make some spikes with double minima next to eachother (within 3 time-steps) and see how they are dealt with?

Thanks, Abu

image

narendramukherjee commented 3 years ago

Yeah, that's great to see! Might be that those double spiked waveforms actually aren't that common and whether you drop them or not doesn't change things much. If it is working ok and saving you running time, I think it might be fine to remove the interpolation. Depends on how big of a time savings it is. I wouldn't spend too much more time in trying to compare interpolation vs non-interpolation - its the final spike sorting that matters. As long as you get approximately the right answer there, no use investing too much time in debugging ;)

abuzarmahmood commented 3 years ago

Great :) I have a couple more files to sort. I'll use this and see if anything weird happens. If all is good, I'll send a PR