Closed Luthaf closed 6 days ago
Thanks @Luthaf !!! A few comments below:
configure
file using autoconf 2.69? installing it should be straightforward (see here).Then let's wait for the other checks that are running now. Thanks again!
for non-periodic systems, what's returned by this functions? A matrix of zeros, or the bounding box of the system? Does it depend from one simulation engine to another?
It is expected to be a matrix of zeros. We only use the box to apply PBCs. But certainly it is possible that some code incorrectly passes a box even when there are no PBCs applied. In any case, within PLUMED it is correct to assume that having non zero numbers here imply PBCs.
in all cases, the storage format seems to be using rows of the matrix to store the three box vectors, is this correct?
Yes (in C order). So box[2][0] is the first component of the third vector.
@Luthaf sorry for not merging this yet. Could you please confirm that I can do it? Thanks!!!
If you are happy with the code, please do merge! I'll send further PR to improve it/update dependencies as needed.
Thanks for your contribution!
And thanks for your review!
Description
This PR add an interface using metatensor atomistic models to execute arbitrary machine learning models, with the intended use case of using these models as collective variables in PLUMED.
The core idea of metatensor is to facilitate data sharing between different machine learning ecosystems by annotating data with metadata which describe what exactly is being passed around. This is done through the
TensorMap
class, which contains one of moreTensorBlock
, each block containing the actual values, andLabels
describing the rows and columns of the values.On top of this data format, we defined an interface for exporting trained ML models, and then loading and executing such models from a variety of simulation engines. The initial objective was to share models that would compute the energy and forces of a system and use these as ML force fields, but the same API can also be used for arbitrary quantities, including collective variables. These models are based on PyTorch for three reasons:
There is some documentation on the workflow to load and execute a metatensor model here for reference: https://docs.metatensor.org/latest/atomistic/overview.html#data-flow-between-the-model-and-engine. The main bits of data a model needs from PLUMED are the types for all particles (for atoms, this can be the atomic number), the positions of all particles, the simulation cell/box and (potentially many) neighbor lists for given cutoffs.
Difference with the existing PyTorch module
Both this module and the existing PyTorch module in PLUMED are based on PyTorch (for the same reasons!), the main difference as far as I know is that the PyTorch module is intended to act as transformation on top of CV computed by other PLUMED actions; which the metatensor module is intended to start with positions/atom types/... and compute a new CV from scratch. This enables using metatensor to compute some representation of the system (such as SOAP) and send it to PLUMED for further processing.
Unresolved questions/issues
The main question remaining for me is what to do regarding the neighbor lists calculation. I currently vendor code from https://github.com/Luthaf/vesin in the metatensor module, and it works pretty well. However, the code is just copy-pasted from another repository, and might need to be updated in the future; but does not 100% conform to the code style of the plumed repository. I can see a couple of solutions here, ordered from favorite to least favorite:
codecheck
orheaders.sh
. Is there a mechanism to mark some files/directories as "external" code that should be ignored by such linters?configure
. This would make the process of updating the code a lot easier (just update the archive), however I don't know if this will help with removing the code formastyle
/codecheck
/... unless these scripts respect.gitignore
.What do you think? Any other idea on what to do regarding "external" code in the PLUMED repository?
The other questions I have concerns the unit cell storage as returned by
this->getPbc().getBox();
:Type of contribution
Copyright
[ ] I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.
[x] the module I added or modified contains a
COPYRIGHT
file with the correct license information. Code should be released under an open source license. I also used the commandcd src && ./header.sh mymodulename
in order to make sure the headers of the module are correct.Tests