Incorrect similarity scores for auto-merged clusters

jsiegle commented 7 years ago

When I use Kilosort's posthoc merge feature, which creates new clusters out of similar templates, the similarity scores no longer match when I load the data in the phy TemplateGUI. Clusters on entirely different parts of the probe have high similarity scores, which never happens without posthoc merges enabled.

I checked the similarity scores manually, and they look OK, so it seems like it's an issue with phy finding the appropriate scores when the cluster ids have been updated by another piece of software.

Any thoughts on how to fix this?

rossant commented 7 years ago

Have you tried deleting the .phy subfolder in the data directory, and reloading the dataset?

nsteinme commented 7 years ago

Also, the best way to check if it's really a bug would be to start with the non-posthoc-merged dataset and manually make one of the exact same merges that was made by kilosort, to see whether the similarity score you get now between that cluster and another is different from when you start directly with the posthoc-merged results. The reason I suggest this is because it's certainly not impossible to get high sim scores for neurons that appear to be on different parts of the probe, since sometimes kilosort finds "double" templates that exist on two parts of the probe - they will have good similarity with templates in both places.

On Wed, Oct 4, 2017 at 10:30 AM, Cyrille Rossant notifications@github.com wrote:

Have you tried deleting the .phy subfolder in the data directory, and reloading the dataset?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kwikteam/phy-contrib/issues/132#issuecomment-334100650, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP-BVQtOSKId1hBRF0-ggvS19l9axks5so1A7gaJpZM4PseTf .

greggheller commented 7 years ago

Thanks for the quick reply!

Deleting the .phy subfolder didn't change anything.

I pulled up an old data set that had not been auto-merged and the problem was happening there as well, even though it wasn't observed earlier when sorting manually.

For troubleshooting I replaced the similar_templates.npy with an array that contains the indicies instead of similarity values: 10001 , 10002, ............, 11023, 11024 20001, ............, ............ ............ 21024 ............ ............ ............ ............ ............ 10230001, ............ ............ ............ 10231024 10240001, 10240002 ............10241023, 10241024

It looks like our version of phy (recently installed developer version) is pulling the correct row from the similarity matrix, but the column index is off by one when it matches a similarity score to a pair. The selected cluster looks correct but the compared cluster is off by 1. (See screenshot). I checked that this seems to be what is happening by finding which element of the real similarity array corresponds to a given score, and it is displaying the score in the adjacent column to what I would've expected phy to display.

Editing the similar_templates.npy array so that the first column is zeros and deleting the last column so the size is appropriate (essentially shifting it one column to the right) seems to fix the problem.

id offset example

rossant commented 7 years ago

Could this be related to MATLAB using 1-based indexing (i.e. the first template is number 1), whereas Python uses 0-based indexing (it expects the first template to be number 0) ?

greggheller commented 7 years ago

That seems likely. But the problem is happening when pulling up data that was sorted 2 months ago with no issues back then. No changes have been made to the similar_templates.npy since 7/13, just opening it on a new installation of the template-gui.

rossant commented 7 years ago

That's weird. Would you be able to git-bisect phycontrib to find the commit that introduced the bug?

lshaheen commented 7 years ago

We had a similar problem when we were first setting up Kilosort and Phy. I think it may be due to a bug in the merge_posthoc2 function. I don't quite remember because I did this awhile ago, but this is what I think happens: Before this function the spike clusters ids ( rez.st3(:,2) before merging, rez.st3(:,5) after merging ) don't include 0, but after merging by similarity, merge_psothoc2 merges all remaining clusters with less than 500 spikes and sets the id of this merged cluster to 0. As far as I remember that's not what phy expects so it somehow messes up the similarity scores. I fixed the problem by adding the following section of code in the main sorting script after merge_posthoc2:

` rez = merge_posthoc2(rez);

%Fix merge problems: %the automerge function merges similar clusters and then also merges clusters %with less than a certain number of spikes and sets their IDs to 0. %This ends up misaligning all the similarity values. This code sets this cluster's %ids to an empty slot, and shifts the other ids down by 1 to start at 0. if(~isequal(rez.st3(:,2),rez.st3(:,5))) %ids are shifted off by one in the merged version
%rez.st3(:,5) contains the cluster identities for each spike %shift indexes down 1 rez.st3(:,5)=rez.st3(:,5)-1;

%find empty slots
un=unique(rez.st3(:,5));
empties=setdiff(1:max(un),un);
if isempty(empties), error('why empty?'); end

%set cluster that did have id=0 to an empty slot
rez.st3(rez.st3(:,5)==-1,5)=empties(1);

end ` This is probably not the ideal way to do this but as far as I can tell it works.

-Luke

greggheller commented 7 years ago

So it looks like the data set I randomly chose to test from a few months ago was also auto-merged in kilosort. The problem doesn't happen elsewhere and Luke's solution does the trick.

Sorry for my mistake Cyrille, and thanks for the help!

kwikteam / phy-contrib

Incorrect similarity scores for auto-merged clusters #132