lab-cosmo / librascal

A scalable and versatile library to generate representations for atomic-scale learning
https://lab-cosmo.github.io/librascal/
GNU Lesser General Public License v2.1
80 stars 20 forks source link

Sparse computation of SOAP components #108

Closed max-veit closed 3 years ago

max-veit commented 5 years ago

The SOAP-like representations (spherical invariants and covariants) currently compute coefficients for all the n-n'-l combinations. We can save a large fraction of the computational cost if we only compute a sparse subset of those combinations (previously determined via something like FPS or CUR).

Let's see what changes we might have to make to the infrastructure to allow specifying a sparse subset to be computed. It might not be much, but it's good to start thinking about it now.

felixmusil commented 4 years ago

I see 3 ways to implement this features:

The implementations for these don't happen at the same level and the implementation effort is quite different so we should make a choice.

max-veit commented 4 years ago

The whole idea of this issue is that we avoid computing (1) components that are zero and (2) components that we’ve decided are unimportant, e.g. via sparsification. The potential gain in efficiency is enormous: If we decide we only need ~10% of the SOAP components to get a good fit, then we can achieve a speedup of a factor of 10 in the summation part of the computation (along with a similar speedup in the cost of computing the kernel).

Therefore the third method you listed is really the only option, as it’s the only one that lets us benefit from both of the efficiency gains above.

felixmusil notifications@github.com schrieb am Mi. 18. Sep. 2019 um 16:50:

I see 3 ways to implement this features:

-

make some kind of sparsifier that would take a Property or PropertyBlockSparse and and remove some data or make a new object with less data

compute everything and save only the needed features

avoid computing the features that are not needed

The implementations for these don't happen at the same level and the implementation effort is quite different so we should make a choice.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/cosmo-epfl/librascal/issues/108?email_source=notifications&email_token=AAFPWMQ5ABG5NHUVNY2L64LQKI53LA5CNFSM4IKLWJBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7AKW5I#issuecomment-532720501, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFPWMXWTYQUX5GIG57SOA3QKI53LANCNFSM4IKLWJBA .

Luthaf commented 4 years ago

Are different components independent from one another? If so, this could be implemented as SphericalInvariant::get_component(frame, center, n, n', l) or something equivalent.

ceriottm commented 4 years ago

I think that for an efficient implementation one needs to specify a list of n, n', l. The way I imagine this it'll be a SparseInvariant that has a list of entries, and then once computed it's just a contiguous vector of entries.

On Tue, 19 Nov 2019 at 08:55, Guillaume Fraux notifications@github.com wrote:

Are different components independent from one another? If so, this could be implemented as SphericalInvariant::get_component(frame, center, n, n', l) or something equivalent.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cosmo-epfl/librascal/issues/108?email_source=notifications&email_token=AAIREZ4DDKWIRGA24WPY3WTQUQK63A5CNFSM4IKLWJBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEO5DVY#issuecomment-555602391, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIREZZJVNFZHE22APERCPTQUQK63ANCNFSM4IKLWJBA .

Luthaf commented 4 years ago

I think that for an efficient implementation one needs to specify a list of n, n', l

why would batched computation be more efficient here?

max-veit commented 4 years ago

I'm not entirely sure that it is, though it would avoid the overhead of repeated (500 or more) function calls, and you could loop over the requested components directly rather than having to repeat the double-species loop each time. I guess we'll have to implement both versions and test.

max-veit commented 3 years ago

This is closed by #265