Open utterances-bot opened 1 year ago
Thank you for the blog!
In the code snippet for assignPointsToClusters()
function, line 11, shouldn't it be sim[i,pick]=0
instead of sim[i,i]=0
or am I missing something?
Thank you!
Hi Greg, just to be clear about the distance threshold, you are using 0.65 because 99% of MFP2 similarity edges in the linked analysis on fingerprint thresholds had a similarity less than ~0.35?
So if I wanted to generalize the way you chose the threshold, I could look at a set of molecules using a particular fingerprint and choose 1 - the 99th percentile of the similarity distribution?
Hi Greg, just to be clear about the distance threshold, you are using 0.65 because 99% of MFP2 similarity edges in the linked analysis on fingerprint thresholds had a similarity less than ~0.35?
Yeah, when comparing random pairs of molecules, 99% of the pairs had a similarity < ~0.35. The most recent version of that blog post is here: https://greglandrum.github.io/rdkit-blog/posts/2021-05-18-fingerprint-thresholds1.html
So if I wanted to generalize the way you chose the threshold, I could look at a set of molecules using a particular fingerprint and choose 1 - the 99th percentile of the similarity distribution?
That's what I would do if I wanted a threshold for "these molecules are more similar to each other than you would expect if they were randomly picked".
RDKit blog - Sphere exclusion clustering with the RDKit
Very fast clustering for larger datasets
https://greglandrum.github.io/rdkit-blog/posts/2020-11-18-sphere-exclusion-clustering.html