plumed / plumed2

Development version of plumed 2
https://www.plumed.org
GNU Lesser General Public License v3.0
323 stars 269 forks source link

Interface to metatensor to use arbitrary machine learning models as collective variables #1082

Closed Luthaf closed 6 days ago

Luthaf commented 1 month ago

Description

This PR add an interface using metatensor atomistic models to execute arbitrary machine learning models, with the intended use case of using these models as collective variables in PLUMED.

The core idea of metatensor is to facilitate data sharing between different machine learning ecosystems by annotating data with metadata which describe what exactly is being passed around. This is done through the TensorMap class, which contains one of more TensorBlock, each block containing the actual values, and Labels describing the rows and columns of the values.

On top of this data format, we defined an interface for exporting trained ML models, and then loading and executing such models from a variety of simulation engines. The initial objective was to share models that would compute the energy and forces of a system and use these as ML force fields, but the same API can also be used for arbitrary quantities, including collective variables. These models are based on PyTorch for three reasons:

There is some documentation on the workflow to load and execute a metatensor model here for reference: https://docs.metatensor.org/latest/atomistic/overview.html#data-flow-between-the-model-and-engine. The main bits of data a model needs from PLUMED are the types for all particles (for atoms, this can be the atomic number), the positions of all particles, the simulation cell/box and (potentially many) neighbor lists for given cutoffs.

Difference with the existing PyTorch module

Both this module and the existing PyTorch module in PLUMED are based on PyTorch (for the same reasons!), the main difference as far as I know is that the PyTorch module is intended to act as transformation on top of CV computed by other PLUMED actions; which the metatensor module is intended to start with positions/atom types/... and compute a new CV from scratch. This enables using metatensor to compute some representation of the system (such as SOAP) and send it to PLUMED for further processing.

Unresolved questions/issues

The main question remaining for me is what to do regarding the neighbor lists calculation. I currently vendor code from https://github.com/Luthaf/vesin in the metatensor module, and it works pretty well. However, the code is just copy-pasted from another repository, and might need to be updated in the future; but does not 100% conform to the code style of the plumed repository. I can see a couple of solutions here, ordered from favorite to least favorite:

What do you think? Any other idea on what to do regarding "external" code in the PLUMED repository?


The other questions I have concerns the unit cell storage as returned by this->getPbc().getBox();:


Type of contribution

Copyright

Tests

GiovanniBussi commented 1 month ago

Thanks @Luthaf !!! A few comments below:

Then let's wait for the other checks that are running now. Thanks again!

GiovanniBussi commented 1 month ago

for non-periodic systems, what's returned by this functions? A matrix of zeros, or the bounding box of the system? Does it depend from one simulation engine to another?

It is expected to be a matrix of zeros. We only use the box to apply PBCs. But certainly it is possible that some code incorrectly passes a box even when there are no PBCs applied. In any case, within PLUMED it is correct to assume that having non zero numbers here imply PBCs.

in all cases, the storage format seems to be using rows of the matrix to store the three box vectors, is this correct?

Yes (in C order). So box[2][0] is the first component of the third vector.

GiovanniBussi commented 6 days ago

@Luthaf sorry for not merging this yet. Could you please confirm that I can do it? Thanks!!!

Luthaf commented 6 days ago

If you are happy with the code, please do merge! I'll send further PR to improve it/update dependencies as needed.

GiovanniBussi commented 6 days ago

Thanks for your contribution!

Luthaf commented 6 days ago

And thanks for your review!