Open bananenpampe opened 1 month ago
I am happy to work on this on my own once we have fixed the API, we need it urgently ^^
So this is something we will want to do, but doing it properly will take time. If you need something urgently, we can do a temporary branch with it just for you =).
We already have a mechanism to add extra data on a system from the engine and retrieve it from the model, through System.add_data
and System.get_data
. What we are missing is a way for the model to request specific data from the engine (which can be other things than ASE!)
I can see something like this working, with a clear standard definition for what different extra data means (like we are standardizing the model outputs)
# model export
model = ...
capabilities = ModelCapabilities(
# as usual
extra_system_data=["charge", "..."] # this would be added
)
# model definintion
def forward(systems, ...):
charges = system.get_data("charges") # error if the data was not requested
What speaks against having a bespoke ase calculator option implementation, in which the ase calculator writes to system data?
I would rather not put some code in the main branch which will be removed and changed once we have a solution for this. But we can do this in a temporary branch so you can go on with the scientific project while we figure the general solution!
I do not need any of the calculators for the scientific part. I would prefer to have it as part of the metatensor releases, because it should be easily pip installable for users. Happy to close the issue if there is no interest for an ase side implementation.
There is interest for a general mechanism that ASE will also use, so let's keep this open.
Some more details on how this could work: we would add an extra field in ModelCapabilities
called something like extra_input
, that would be a Dict[str, ModelOutput]
, describing everything the model want as extra input.
The engine would then get this field from the model, and following some specification (like the one we have for the outputs), store the corresponding data in the systems. Here, we could do the same as for outputs, and have "standard" extra input data, and allow users to do whatever as long as they add a namespace somewhere (i.e. my_model::custom_input
).
The model can then access the data in the system.
ModelCapabilities
to something like ModelSpecification
since it will contain more than what the model can do?ModelOutput
to something like Quantity
/Property
/Data
, since it is not only about outputs?Dict[str, ModelOutput]
is currently used to describe a bunch of TensorMap (the model outputs). Here it would describe a bunch of TensorBlock
in the system data. Should we use TensorMap for everything?
Many ase claculators use additional information of the
ase.Atoms
object, such as total charge, magnetic moment and other arbitrary properties from the.info
and.arrays
dict. There should be a standardized way to define, what additional properties get read from thease.Atoms
object in theatomistic.ase_calculator.MetatensorCalculator
, and then converted into a TensorBlock which gets passed wih the systems object.For simplicity, I would propose that there should be two generic options that either access the properties from the .info and .arrays dict, plus predefined options for:
get_initial_magnetic_moments()
andget_initial_charges()
.In principle this could be optional (so if the field/info does not exist, nothing will get parsed instead of rasining an Exception), should this be handeled on the ase.calculator side, or on the specific model side?
Maybe we could provide an additional
parse_properties
kwargs in thease_calculator.MetatensorCalculator
and then handle the extraction logic here: https://github.com/lab-cosmo/metatensor/blob/b34c0f3757b95cd85e04ce8cf468499e06a5a326/python/metatensor-torch/metatensor/torch/atomistic/ase_calculator.py#L211