greglandrum / rdkit-blog

RDKit blog
https://greglandrum.github.io/rdkit-blog/
7 stars 2 forks source link

rdkit-blog/posts/2020-11-18-sphere-exclusion-clustering #14

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

RDKit blog - Sphere exclusion clustering with the RDKit

Very fast clustering for larger datasets

https://greglandrum.github.io/rdkit-blog/posts/2020-11-18-sphere-exclusion-clustering.html

Srilok commented 1 year ago

Thank you for the blog!

In the code snippet for assignPointsToClusters() function, line 11, shouldn't it be sim[i,pick]=0 instead of sim[i,i]=0 or am I missing something?

Thank you!

PatrickPenner commented 3 months ago

Hi Greg, just to be clear about the distance threshold, you are using 0.65 because 99% of MFP2 similarity edges in the linked analysis on fingerprint thresholds had a similarity less than ~0.35?

So if I wanted to generalize the way you chose the threshold, I could look at a set of molecules using a particular fingerprint and choose 1 - the 99th percentile of the similarity distribution?

greglandrum commented 3 months ago

Hi Greg, just to be clear about the distance threshold, you are using 0.65 because 99% of MFP2 similarity edges in the linked analysis on fingerprint thresholds had a similarity less than ~0.35?

Yeah, when comparing random pairs of molecules, 99% of the pairs had a similarity < ~0.35. The most recent version of that blog post is here: https://greglandrum.github.io/rdkit-blog/posts/2021-05-18-fingerprint-thresholds1.html

So if I wanted to generalize the way you chose the threshold, I could look at a set of molecules using a particular fingerprint and choose 1 - the 99th percentile of the similarity distribution?

That's what I would do if I wanted a threshold for "these molecules are more similar to each other than you would expect if they were randomly picked".