Closed Luthaf closed 2 years ago
I just tested this in my own machine and I'm getting these timings with your code:
0.796938419342041 1.370176076889038 22.46044635772705 29.632057189941406
Here, soap_turbo
indeed appears to be slower than soap, but nowhere near as much as you report. It could be due to threading. What happens if you export OMP_NUM_THREADS=1
? I'm unsure if soap is able to use OpenMP but soap_turbo isn't. It could also be the Quippy interface slowing things up. Have you tried executing this via quip
directly?
Also, since soap_turbo
came up there have been some speedups applied to soap
, but soap_turbo
has remained the same (we're working on some major speedup, but it will require significanly revamping the algorithm; should be ready in a few months).
All that said, most of the speed up in soap_turbo
these days comes from compression (add "compression_mode=trivial"):
0.7820601463317871 1.2784626483917236 22.447352647781372 8.230390071868896
Surprisingly, the descriptors without derivatives didn't get too much speedup, which maybe again points at a slow soap_turbo
execution via Quippy. Compression will be especially important when evaluating the kernels (the dot products become significantly cheaper).
Thanks for the explanation. By setting OMP_NUM_THREADS=1
, I get similar results to yours, so it looks like most of the difference between QUIP's soap and turbo_soap is related to parallelism.
Compression will be especially important when evaluating the kernels (the dot products become significantly cheaper).
Just because there are fewer dimensions when taking the dot product? Or you are using a different way of computing the kernels with compression?
Also, since soap_turbo came up there have been some speedups applied to soap, but soap_turbo has remained the same (we're working on some major speedup, but it will require significanly revamping the algorithm; should be ready in a few months).
That sounds interesting, I'll look into these speedups when they come out!
It's just because of the reduced dimensions (typically a factor of 5 with trivial compression, but I added James Darby's recipes which should allow for some extra compression too. We've thought of linearizing the kernel, which with compression is a feasible speedup strategy for zeta = 2 (and obviously zero = 1) for typical sparse set sizes, but we haven't implemented it yet.
note that the openmp parallelism of soap will lose its advantage once you move over to lammps which uses domain decomposition and calls gap in serial, so you get all the benefits of turbosoap without any downsides
— Gábor
Gabor Csanyi Professor of Molecular Modelling Engineering Laboratory University of Cambridge
On Wed, Jun 15, 2022 at 15:09, Guillaume Fraux @.***> wrote:
Thanks for the explanation. By setting OMP_NUM_THREADS=1, I get similar results to yours, so it looks like most of the difference between QUIP's soap and turbo_soap is related to parallelism.
Compression will be especially important when evaluating the kernels (the dot products become significantly cheaper).
Just because there are fewer dimensions when taking the dot product? Or you are using a different way of computing the kernels with compression?
Also, since soap_turbo came up there have been some speedups applied to soap, but soap_turbo has remained the same (we're working on some major speedup, but it will require significanly revamping the algorithm; should be ready in a few months).
That sounds interesting, I'll look into these speedups when they come out!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hello!
I tried playing a bit with soap_turbo and comparing it to the base soap implementation in QUIP. It is my understanding that soap_turbo should be much faster than the QUIP version, but I find it to be 5x slower. Am I doing anything wrong here?
I'm using quippy to access both, and made sure to compile both
turbogap
andQUIP
with-O3 -fopenmp
. I'm running on macOS 12.2.1, on a M1 CPU with 10 cores.Here is the setup code I use:
And the timings are as follow:
Do you see anything strange in the hyperparameters or the code I used here?
molecular_crystals.xyz.txt