lab-cosmo / metatensor

Self-describing sparse tensor data format for atomistic machine learning and beyond
https://docs.metatensor.org
BSD 3-Clause "New" or "Revised" License
45 stars 14 forks source link

Passing ensemble predictions to MD engines #650

Closed frostedoyster closed 4 days ago

frostedoyster commented 1 month ago

Given #648, I thought it was a good idea to open an issue for the forwarding of ensemble predictions to MD engines, which we will need relatively soon.

The design could be as simple as having a new optional property name (something like ensemble_member) for any registered output in metatensor.torch.atomistic, and then passing this to the metatensor interface to the MD engine, which can decide how to handle it. i-PI should be well-equipped for this. ASE is also very customizable so we should be able to make the ensemble predictions available to the user relatively easily. I don't know LAMMPS very well, but perhaps we could code something in our driver, or perhaps leave it alone and error out if ensemble predictions are received.

@Luthaf @ceriottm

ceriottm commented 1 month ago

In i-PI this should be implemented at the level of the metatensor driver, but yes should be feasible. Davide and Matthias are best positioned to help.

Luthaf commented 1 month ago

The design could be as simple as having a new optional property name (something like ensemble_member) for any registered property in metatensor.torch.atomistic, and then passing this to the metatensor interface to the MD engine, which can decide how to handle it.

I'm not sure I understand what you mean here. If we are changing property name in the output, this will have to be a different TensorMap. I also don't think that changing behavior based on property names is desirable, since this makes the whole thing more complex to explain to users and engine developers alike.


My favorite solution here would be to do another output name alltogether, with its own specification. So for example ensemble_energy, which is similar to energy except it has multiple properties per sample. For maximal compatibility, we should also encourage any model that can do ensemble_energy to also have a energy output, that will be used by any engine that don't know or don't care about ensembles.


In general, what's the use case for this? Do we actually need to full ensemble of predictions, or will this always be used to compute the mean and standard deviation for uncertainty quantification purposes?

frostedoyster commented 1 month ago

@ceriottm can expand, but passing all the individual predictions is necessary. And yes, I meant an optional property name that may or may not be there. It's more complicated, but it would give us ensembles for free on any output that we might register later on. Registering a different output like energy_ensemble is also a possibility

Luthaf commented 4 days ago

Should we keep this open to track implementation in the relevant engines?