rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.67k stars 880 forks source link

faster rmsd calculations in rdkit #1460

Open UnixJunkie opened 7 years ago

UnixJunkie commented 7 years ago

I think Douglas Theobald's method is the fastest on earth (QCP). We should use that. He has some code in C++ or C, I don't remember. I can send it if needed.

gedeck commented 7 years ago

Code and further information is here http://theobald.brandeis.edu/qcp/

bp-kelley commented 7 years ago

I implemented the QCP code from their paper for OEChem a while back, it is substantially faster and I could do it again as well.

The downside is that it quite often doesn't achieve 0 rmsd for the same conformations but some quite small number < 10e-6. We deemed this sufficient but was kind of annoying.

There are two versions, one that computes the RMSD without the transform and one that computes the RMSD and the transform. The latter is more complicated.

Cheers, Brian

On Thu, Jun 15, 2017 at 7:52 AM, gedeck notifications@github.com wrote:

Code and further information is here http://theobald.brandeis.edu/qcp/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/rdkit/issues/1460#issuecomment-308708734, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbioEwS3rGbRyBpgz_1DeOdVWgInidGks5sERsRgaJpZM4N65mB .

proteneer commented 7 years ago

I've had stability issues with the single precision variant of the QCP algorithm before. I've actually implemented a GPU version of this back when I was in grad school for clustering purposes, which typically requires either all-against-one or all-against-all types of RMSD calculations.

In particular, I even recall writing a comment about the determinant calculation (via Laplacian expansion) that was particularly prone to floating point imprecisions:

https://github.com/rmcgibbo/GPURMSD/blob/master/gpurmsd/kernel_rmsd.cu#L302

bp-kelley commented 7 years ago

That is very interesting. Our implementation didn't consider the issues of clustering, but I do recall that in practice, it was the small rmsds that had the issues. We thought about setting a cutoff for using a slower version for really close rmsds but never implemented it.


Brian Kelley

On Jun 15, 2017, at 10:42 PM, Yutong Zhao notifications@github.com wrote:

I've had stability issues with the single precision variant of the QCP algorithm before. I've actually implemented a GPU version of this back when I was in grad school for clustering purposes, which typically requires either all-against-one or all-against-all types of RMSD calculations.

In particular, I even recall writing a comment about the determinant calculation (via Laplacian expansion) that was particularly prone to floating point imprecisions:

https://github.com/rmcgibbo/GPURMSD/blob/master/gpurmsd/kernel_rmsd.cu#L302

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

gedeck commented 7 years ago

We could use it for the BestRMSD code. We can always follow up with a more precise approach once the 'best' alignment is found.

UnixJunkie commented 7 years ago

Here is some code that I have used in production: https://github.com/UnixJunkie/durandal_qcp/blob/master/src/qcprot.cc look for rmsd_without_rotation_matrix. I think that's what most people want. When you want to superpose molecules, that's another business.

proteneer commented 7 years ago

@bp-kelley

Interestingly enough RMSD actually satisfies requirements of a metric (the only non-trivial part is the triangle inequality requirement), for a rigorous math proof, see:

http://scripts.iucr.org/cgi-bin/paper?S0108767302011637

So having an optimized variant would be quite nice!

Note that @ihaque has also written an insanely optimize CPU rmsd somewhere, maybe he can point us to the location?

ihaque commented 7 years ago

Yup, here you go: https://github.com/pandegroup/IRMSD

Dadiao-shuai commented 11 months ago

How to set RDKIT to use Hungarian Algorithm to calculate RMSD?