jniediek / combinato

Automatic spike sorting, cluster visualization, spike sorting GUIs
MIT License
42 stars 17 forks source link

Issue: Mismatch Between Timestamps of Extracted Spikes and Raw Signal Peaks in extract_spikes.py #83

Open NeuroGuth opened 2 months ago

NeuroGuth commented 2 months ago

Hello everyone,

We've identified a discrepancy between the timestamps of spikes extracted using extract_spikes.py and the timestamps of the corresponding peaks in the raw signal. The issue arises because spike maxima are detected in the 300-1000 Hz bandpass-filtered signal, but these maxima do not always coincide with the maxima in the actual data.

Example

In the example below, the maximum of the spike is detected at index 318,812 in the 300-1000 Hz bandpass-filtered signal (orange), whereas the maximum of the extracted spike appears at index 3,128,808 in the 300-3000 Hz bandpass-filtered signal (blue). This is a mismatch of four samples (sampling rate = 32768 Hz):

image_2024_08_26T17_02_52_174Z image_2024_08_26T17_03_00_382Z

Cause

The script extract_spikes.py currently aligns the spikes but does not adjust the corresponding timestamps, leading to this mismatch.

Proposed Solution

To resolve this, I propose modifying the script to also align the timestamps of the spikes. Below is the suggested code, which could replace the code after line 110:

         # upsample spikes
        spikes = upsample(spikes, factor)

        # adjust maxima
        center = (pre_indices + 5) * factor
        start = center - 5 * factor
        end = center + 5 * factor
        alignment_shifts = np.round((spikes[:, start:end].argmax(1) - 5 * factor) / factor).astype(int)
        adjusted_maxima = maxima + alignment_shifts

        # adjust spikes
        spikes, index_maximum = align(spikes,
                                  center,
                                  factor,
                                  factor)

        # get time stamps of spikes
        timestamps = times[adjusted_maxima]

    # remove spikes, that could not be aligned
        spikes, removed_indices = clean(spikes, index_maximum)
        timestamps = timestamps[~removed_indices]

        # downsample spikes
        spikes, new_length = downsample(spikes, index_maximum, factor, pre_indices, indices_per_spike)

        if sign == 1:
            spikes *= -1

        result.append((spikes, timestamps))
    result.append([(times[0], times[-1], threshold)])

    return result

Additional question

The proposed code activates the clean() function, which removes spikes that could not be properly aligned. Are there any significant downsides to enabling this function? If it is deactivated, there seems to be a risk that some misaligned spikes may be included.

Looking forward to your response!

Best regards,

Tim

jniediek commented 2 months ago

Hey Tim! Thank you for pointing this out, and great that you made the effort to investigate the maxima timestamp topic! Your explanation of the issue and code suggestion are very clear. However, I am undecided whether it is a good idea to include the change that you suggest as a default. Imagine a situation where the maximum in the raw data is the result of high-frequency noise added on top of the "real" signal. So there is reason to believe that the bandpass filter would actually reduce the influence of the noise and locate the "real" maximum more precisely, right?

With regards to the clean() function, this is also debatable. Probably a systematic analysis would be necessary to decide whether the removal of misaligned spikes has more advantages than disadvantages.

What do you think?

NeuroGuth commented 2 months ago

Hey Johannes,

Thank you very much for your quick reply! I agree that it's challenging to identify the "real" maxima of the spikes. Do we know to what extent the high-frequency components are noise versus actual parts of the spikes?

In either case, aligning the timestamps with the peaks of the extracted spikes could be useful for several purposes, such as analyzing spike-field relationships with high temporal resolution, extracting spike shapes from the raw signal, or removing mean spikes from the LFP before analyzing the spike-field relationships. In these situations, it might be confusing if the spikes are realigned without adjusting the timestamps. Would it be possible to add an option to realign the timestamps in extract_spikes.py? Perhaps this could be made optional in the Combinato settings?

I also though again about the clean() function. I agree that it would probably require extensive testing to check the potentially undesired effects of removing some spikes before clustering. Probably, keeping some misaligned spikes in the data is not a big issue, since they likely won't be assigned to any real cluster in the final spike sorting results. What do you think?

Best,

Tim

jniediek commented 2 months ago

Hey Tim, I agree with all points. I'd be happy if you could create a pull request that introduces an option to have the spike times aligned together with the spikes. Regarding the cleaning, you could also introduce this as an option?

Best Johannes

NeuroGuth commented 1 month ago

Hey Johannes,

great! I have created a pull request for these two options.

Best Tim