soap_turbo is slower than QUIP soap

Luthaf commented 2 years ago

Hello!

I tried playing a bit with soap_turbo and comparing it to the base soap implementation in QUIP. It is my understanding that soap_turbo should be much faster than the QUIP version, but I find it to be 5x slower. Am I doing anything wrong here?

I'm using quippy to access both, and made sure to compile both turbogap and QUIP with -O3 -fopenmp. I'm running on macOS 12.2.1, on a M1 CPU with 10 cores.

Here is the setup code I use:

import quippy
import ase.io

frames = ase.io.read("molecular_crystals.xyz", ":")

all_species = set()
for frame in frames:
    all_species.update(frame.numbers)

SOAP_HYPERS = [
    "soap",
    "n_max=8",
    "l_max=7",

    "cutoff=4.5",
    "cutoff_transition_width=0.5",

    "atom_sigma=0.3",
    f"n_species={len(all_species)}",
    "species_Z={" + " ".join(map(str, all_species)) + "}",
]

soap_calculator = quippy.descriptors.Descriptor(" ".join(SOAP_HYPERS))

TURBO_SOAP_HYPERS = [
    "soap_turbo",    
    "alpha_max={8 8 8 8}",
    "l_max=7",

    "rcut_hard=4.5",
    "rcut_soft=4.0",

    "atom_sigma_r={0.3 0.3 0.3 0.3}",
    "atom_sigma_t={0.3 0.3 0.3 0.3}",

    "atom_sigma_r_scaling={0.0 0.0 0.0 0.0}",
    "atom_sigma_t_scaling={0.0 0.0 0.0 0.0}",

    "amplitude_scaling={1 1 1 1}",
    "central_weight={1 1 1 1}",
    "add_species=F",

    f"n_species={len(all_species)}",
    "species_Z={" + " ".join(map(str, all_species)) + "}",
]

turbo_soap_calculators = []

for i in range(len(all_species)):
    turbo_soap_calculators.append(
        quippy.descriptors.Descriptor(" ".join(TURBO_SOAP_HYPERS) + f" central_index={i + 1}")
    )

And the timings are as follow:

%%timeit
soap = soap_calculator.calc(frames)

# 220 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for calculator in turbo_soap_calculators:
    turbo_soap = calculator.calc(frames)

# 1.06 s ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
soap = soap_calculator.calc(frames, grad=True)

# 3.04 s ± 89.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for calculator in turbo_soap_calculators:
    turbo_soap = calculator.calc(frames, grad=True)

# 16.5 s ± 289 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Do you see anything strange in the hyperparameters or the code I used here?

molecular_crystals.xyz.txt

mcaroba commented 2 years ago

I just tested this in my own machine and I'm getting these timings with your code:

0.796938419342041 1.370176076889038 22.46044635772705 29.632057189941406

Here, soap_turbo indeed appears to be slower than soap, but nowhere near as much as you report. It could be due to threading. What happens if you export OMP_NUM_THREADS=1? I'm unsure if soap is able to use OpenMP but soap_turbo isn't. It could also be the Quippy interface slowing things up. Have you tried executing this via quip directly?

Also, since soap_turbo came up there have been some speedups applied to soap, but soap_turbo has remained the same (we're working on some major speedup, but it will require significanly revamping the algorithm; should be ready in a few months).

All that said, most of the speed up in soap_turbo these days comes from compression (add "compression_mode=trivial"):

0.7820601463317871 1.2784626483917236 22.447352647781372 8.230390071868896

Surprisingly, the descriptors without derivatives didn't get too much speedup, which maybe again points at a slow soap_turbo execution via Quippy. Compression will be especially important when evaluating the kernels (the dot products become significantly cheaper).

Luthaf commented 2 years ago

Thanks for the explanation. By setting OMP_NUM_THREADS=1, I get similar results to yours, so it looks like most of the difference between QUIP's soap and turbo_soap is related to parallelism.

Compression will be especially important when evaluating the kernels (the dot products become significantly cheaper).

Just because there are fewer dimensions when taking the dot product? Or you are using a different way of computing the kernels with compression?

Also, since soap_turbo came up there have been some speedups applied to soap, but soap_turbo has remained the same (we're working on some major speedup, but it will require significanly revamping the algorithm; should be ready in a few months).

That sounds interesting, I'll look into these speedups when they come out!

mcaroba commented 2 years ago

It's just because of the reduced dimensions (typically a factor of 5 with trivial compression, but I added James Darby's recipes which should allow for some extra compression too. We've thought of linearizing the kernel, which with compression is a feasible speedup strategy for zeta = 2 (and obviously zero = 1) for typical sparse set sizes, but we haven't implemented it yet.

gabor1 commented 2 years ago

note that the openmp parallelism of soap will lose its advantage once you move over to lammps which uses domain decomposition and calls gap in serial, so you get all the benefits of turbosoap without any downsides

— Gábor

Gabor Csanyi Professor of Molecular Modelling Engineering Laboratory University of Cambridge

On Wed, Jun 15, 2022 at 15:09, Guillaume Fraux @.***> wrote:

Thanks for the explanation. By setting OMP_NUM_THREADS=1, I get similar results to yours, so it looks like most of the difference between QUIP's soap and turbo_soap is related to parallelism.

Compression will be especially important when evaluating the kernels (the dot products become significantly cheaper).

Just because there are fewer dimensions when taking the dot product? Or you are using a different way of computing the kernels with compression?

Also, since soap_turbo came up there have been some speedups applied to soap, but soap_turbo has remained the same (we're working on some major speedup, but it will require significanly revamping the algorithm; should be ready in a few months).

That sounds interesting, I'll look into these speedups when they come out!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

libAtoms / soap_turbo

soap_turbo is slower than QUIP soap #2