petersenpeter / CellExplorer

CellExplorer is a graphical user interface, a standardized processing module and data structure for exploring and classifying single cells acquired using extracellular electrodes.
https://cellexplorer.org/
BSD 3-Clause "New" or "Revised" License
96 stars 59 forks source link

Indexing error, maybe mismatch in channel number #50

Closed YangfanPeng closed 2 years ago

YangfanPeng commented 2 years ago

Hello, sorry for posting here again (in addition to the googlegroups), but I think I can now provide a more detailed assessment of my error.

I am trying to run the ProcessCellMetrics pipeline with neuropixel 1.0 data that has been spike sorted using Kilosort3. The data has been recorded using openEphys and nothing else is being done to the data.

In the validation of the session metadata, it tells me that "electrodeGroups channel count is not equal to extracellular.nChannels".

When I run the cell metrics pipeline, it does run and imports waveforms nicely (I am seeing the plots). However, midway, it stops and I receive the following error

loadSpikes: Loading Phy data
Importing 638/1269 clusters from phy
Saved variable 'noiseLevel' to Y:\Staging\Yangfan\yp012\yp012_220211\yp012_2022-02-11_14-26-53\Record Node 102\experiment1\recording1\continuous\Neuropix-PXI-100.2\continuous.noiseLevel.channelInfo.mat
Getting waveforms from dat file (nPull=600). Bandpass filter applied: 500 - 8000 Hz. No bad channels detected.
Index exceeds the number of array elements. Index must not exceed 85.

Error in getWaveformsFromDat (line 225)
    nChannelFit = min([16,length(goodChannels),length(electrodeGroups{spikes.shankID(ii)})]);

Error in loadSpikes (line 855)
        spikes = getWaveformsFromDat(spikes,session,'showWaveforms',parameters.showWaveforms);

Error in ProcessCellMetrics (line 271)
    spikes{1} =
    loadSpikes('session',session,'labelsToRead',preferences.loadSpikes.labelsToRead,'getWaveformsFromDat',parameters.getWaveformsFromDat,'showWaveforms',parameters.showWaveforms);

Error in CellExplorer_Tutorial (line 19)
cell_metrics = ProcessCellMetrics('session', session);

After diving into it, I suspect that it has something to do with Kilosort3 putting out fewer channels than 384. I only have 374 channels in the rez structure. I am not sure why it does that and I have opened a report for this on the kilosort repo: https://github.com/MouseLand/Kilosort/issues/482

The extracellular tab indicates that there are only 374 spikeGroups and 374 electrodeGroups. However, when I set the number of channels to 374 in the GUI, it messes up the import of the waveforms: all the waveform are just noise, I think the channel mapping gets shifted.

So my question is:

  1. Is this error due to there being only 374 channels instead of 384? Or is it unrelated?
  2. Is there a way to compensate for this? I know the mising channels [37 76 113 152 189 228 165 304 341 380]. Can I somehow parse this into the Cellexplorer?

Thanks so much for your help! Very eager to use this amazing tool!

Best Yangfan

Matlab version 2021b

petersenpeter commented 2 years ago

Hi Yangfan

Happy to help. Could you start by sending me your session struct/mat file? This will make troubleshooting easier (petersen.peter@gmail.com).

I have used CellExplorer with both KiloSort 1 and 2 and Neuropixel recordings (not KiloSort version 3), so I think it is just a matter of providing the correct parameters in the session struct. Are all 384 channels saved to the dat file?

Alternatively, there is a bug in loadSpikes for KiloSort 3...

/ Peter

YangfanPeng commented 2 years ago

Hi Peter, thanks for the quick reply! Sorry for the late answer, I wanted to try a few more things. I have sent you the session file. The dat file has 384 channels, but I assume that there is an issue in my kilosort pipeline that results in fewer channels in the rez.mat file (e.g. 374). I am currently using the Phy mode to import the session struct (was set as default). If I choose kilosort as import method, it comes up with another error:

Index in position 1 exceeds array bounds.
Error in loadSpikes (line 810)
                [~,spikes.maxWaveformCh1(UID)] = max(abs(rez.U(:,rez.iNeigh(1,spike_clusters(i)),1)));

I think this is due to rez.iNeigh being empty. I do not know why this is outputted this way by Kilosort.

I traced the other error (using Phy import method) back a bit and think that somewhere here, it goes awry due to a mismatch in goodChannels and actual number of channels (line 178 in getWaveformsFromDat.m):

    spikes.maxWaveformCh1(ii) = goodChannels(maxWaveformCh1);
    spikes.maxWaveformCh(ii) = spikes.maxWaveformCh1(ii)-1;

    % Assigning shankID to the unit
    for jj = 1:nElectrodeGroups
        if any(electrodeGroups{jj} == spikes.maxWaveformCh1(ii))
            spikes.shankID(ii) = jj;
        end
    end

Is there a way to parse the identified missing channels as "bad channels", so they are being ignored? Or can the pipeline get the valid channels from rez.ops.chanMap? I mean the spikeGroups and electrodeGroups has the correct number of 374 channels... Thanks for the help!!

Best Yangfan image

YangfanPeng commented 2 years ago

Update: I managed to get pass the waveform import with the following hack before the cell metrics pipeline.

session.channelTags.Bad.channels = find(~ismember(1:384,rez.ops.chanMap));
%% 3.1.1 Run the cell metrics pipeline 'ProcessCellMetrics' using the session struct as input
cell_metrics = ProcessCellMetrics('session', session);

However, I now get another error when calculating the waveform metrics:

Index exceeds the number of array elements. Index must not exceed 374.
Error in ProcessCellMetrics (line 532)
                trilat_pos = trilat(cell_metrics.general.chanCoords.x(bestChannels),cell_metrics.general.chanCoords.y(bestChannels),peakVoltage(idx(1:trilat_nChannels)),beta0,0); % ,1,cell_metrics.waveforms.filt_all{j}(bestChannels,:)

Apparently, it has bestChannels that are > 374. Any ideas how I could solve this? Thanks!

petersenpeter commented 2 years ago

Hi Yangfan. Could you check that you send the session struct to the correct email address: petersen.peter@gmail.com, as I did not receive it. Could you perhaps also share your rez, phy and maybe your raw data?

petersenpeter commented 2 years ago

You can define/parse bad channels in the session struct: session.channelTags.Bad.channels: session.channelTags.Bad.channels = [37 76 113 152 189 228 165 304 341 380] This has to be 1-indexed (Phy uses 0-indexing)

YangfanPeng commented 2 years ago

Thanks. I have used the last the last tip already and it worked for the import of waveforms. But unfortunately, the next step "Calculating waveform metrics Using existing channel coordinates" did not work, see above. I also retried with setting the nChannels in sessions to 374 using the GUI which also did not help. I have replied to your email and provided a link to all raw and processed data. Thanks so much!

Do you mean by 1-indexed that it should be like this? session.channelTags.Bad.channels = [36 75 112 151 188 227 164 303 340 379]

YangfanPeng commented 2 years ago

Another update: I was able to run through the trilateralization by introdcunig these lines of code in line 532 of ProcessCellMetrics.m. Do you think this hotfix is valid? With this I managed to run through the whole pipeline and visualize the data (it is so awesome, incredibly thankful for this wonderful tool!!).

%hotfix
bestChannels(bestChannels > height(chanCoords.x))=[]; %height of chanCoords.x gives out actual number of channels
peakVoltage_idx = idx(1:trilat_nChannels); %introduced to remove indexing of missing channels
peakVoltage_idx(peakVoltage_idx > height(chanCoords.x))=[];

%original code continued (line 532 in ProcessCellMetrics.m)
beta0 = [cell_metrics.general.chanCoords.x(bestChannels(1)),cell_metrics.general.chanCoords.y(bestChannels(1))]; % initial position
trilat_pos = trilat(cell_metrics.general.chanCoords.x(bestChannels),cell_metrics.general.chanCoords.y(bestChannels),peakVoltage(peakVoltage_idx),beta0,0); % ,1,cell_metrics.waveforms.filt_all{j}(bestChannels,:) %changed index of peakVoltage in hotfix
petersenpeter commented 2 years ago

Hi Yangfan

I found the issue, which was caused by a bug when the sessionTemplate imported the channel coordinates from the rez.mat file. Below is a working session file for your dataset (I had to zip it to share it). The channel coordinates (session.extracellular.chanCoords.x and session.extracellular.chanCoords.y) must have the same length as the number of channels, which in your case is 384. KiloSort only works with a subset of the channels, which I had not taken into account.

I pushed a new version of CellExplorer to GitHub, to fix the bug, so make sure to update your local repo before running it again. Make sure to delete any derived files before running the pipeline again (any basename.*.mat files).

continuous.session.mat.zip

Let me know how it goes!

/ Peter

YangfanPeng commented 2 years ago

Hi Peter. Sorry for the late reply. I just tested your new version and it works out of the box for newly sorted data! This is really amazing, thanks so much for the quick fix! 👍 I really appreciate this powerful tool.

Best Yangfan

petersenpeter commented 2 years ago

Great - Happy that it worked!