mir-group / nequip

NequIP is a code for building E(3)-equivariant interatomic potentials
https://www.nature.com/articles/s41467-022-29939-5
MIT License
611 stars 135 forks source link

🌟 [FEATURE] Masking out some labels (e.g. constrained atoms) #307

Open Linux-cpp-lisp opened 1 year ago

Linux-cpp-lisp commented 1 year ago

BETA implemention on masks: https://github.com/mir-group/nequip/tree/masks/examples/mask_labels

See https://github.com/mir-group/nequip/discussions/240 for more discussion.

mhellstr commented 1 year ago

A related feature request would be to set custom weights per force component (e.g. as additional columns in an ASE .xyz dataset).

If I have structures where some atoms have very large forces I do not really care how accurate the trained model is on those large forces, only that they are "large", so it would be great to be able to give them a much smaller weight in the loss function.

This would be useful for example when adding short interatomic distances, or when breaking chemical bonds (i.e. far from equilibrium).

Linux-cpp-lisp commented 1 year ago

Hi @mhellstr ,

That would be something you'd implement as a custom loss function, see here: https://github.com/mir-group/nequip-example-extension

You could either make a custom loss that directly depends on the force magnitude, or you could take a force_weights (or whatever) key from the data and use that to reweight the loss value. (In the second case, you'd just need to include the force_weights in the dataset as a custom field with include_keys)

Zausinator commented 2 months ago

Hi, Super cool, I would love to also use this feature in my work! Is it possible to also set the masked atom's atomic energy contribution to zero? Ideally I would like to keep the masked atoms in the structure to keep the local atomic environment of the relevant atoms intact, however I do not want them contributing to the ultimage energy of the structure. I tried by including the value "atomic_energy" in the field fields_to_mask, but that didn't work, presumably because the training data only includes one scalar value for the total_energy. Is there anyway around this? Thanks!