torchmd / torchmd-net

Training neural network potentials
MIT License
335 stars 75 forks source link

Revisiting standardize #277

Closed RaulPPelaez closed 9 months ago

RaulPPelaez commented 9 months ago

When enabled, the standardize functionality processes the whole Dataset and stores the mean and std of the energies (well, the "y" field) in it:

https://github.com/torchmd/torchmd-net/blob/6d8e3159cfb8bb971ecf7a2abd589735d79a7e53/torchmdnet/data.py#L172-L202

These are then used during prediction:

https://github.com/torchmd/torchmd-net/blob/6d8e3159cfb8bb971ecf7a2abd589735d79a7e53/torchmdnet/models/model.py#L358-L375

There are some issues we should consider with its current state:

  1. There is no way to get unit std without shifting energies.
  2. It has a potentially unwanted interaction with atomref.
  3. For unit std it might be better to do batch normalization (which does not require going over the whole dataset before training)

I am opening this issue to start a discussion on what to do about these.

guillemsimeon commented 9 months ago

I can say something here. Standardize was only used in (r)MD17 (and it didn't even make sense). Since MD17 consists in training on just one system, people were using at some point the standard deviation and the mean to make learning easier. Of course this does not apply when building general NNPs, and one can argue that it does not even make sense in the single system case. To the best of my knowledge, it has never been used apart from (r)MD17. The way to go with NNPs, imo, is by means of atomic reference energies (even if they are learnable), because this can scale to arbitrary systems. Since you are also currently having the discussion about Atomref, my point would be that having Atomref-like behavior inside the full model makes infinitely more sense than having standardize. Also notice that in the case that no Atomref is provided in the dataset, one can build an Atomref by taking for example the mean energy per atom (or even per element, some people do that) going through the whole dataset, with learnable rescaling and shifting parameters per element. In that case, even though it is defined in terms of the dataset, the predictions can be scaled to arbitrary systems.

PhilippThoelke commented 9 months ago

I agree that atom references are a lot more powerful than simply removing the mean of the total energy but I would argue that scaling to unit stddev can still be useful. Especially when not really thinking about the energy units (about which torchmd-net is agnostic) there should be a way to ensure proper scaling of the target values. Bad value ranges can really mess with training efficiency.

guillemsimeon commented 9 months ago

I cannot come up with any use of standardize apart from rMD17, and the abstract case of scaling the energies to std one. In the reality you will never have a std to refer to when doing inference, it is an artifact of the training set. Allegro’s paper has a nice explanation on target normalizations.

On Mon, 12 Feb 2024 at 21:08, Philipp Thölke @.***> wrote:

I agree that atom references are a lot more powerful than simply removing the mean of the total energy but I would argue that scaling to unit stddev can still be useful. Especially when not really thinking about the energy units (about which torchmd-net is agnostic) there should be a way to ensure proper scaling of the target values. Bad value ranges can really mess with training efficiency.

— Reply to this email directly, view it on GitHub https://github.com/torchmd/torchmd-net/issues/277#issuecomment-1939478991, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANJMOA645QAJPMZCBCD4I2DYTJZDTAVCNFSM6AAAAABDFDV3B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZGQ3TQOJZGE . You are receiving this because you commented.Message ID: @.***>

RaulPPelaez commented 9 months ago

Thanks for the wonderful insights guys.

It is becoming clear to me that we should:

  1. Build the current Atomref prior into de model (at the same conceptual level as standardize is now)
  2. Transform the current Atomref prior into something like "LearnablePerAtomEnergyOffset"
  3. Separate unit-std from energy displacement in standardize
  4. Allow customization of standardize energy displacement (use ref_energies, compute mean energy per element in the dataset...)
guillemsimeon commented 9 months ago

for me it is the most elegant and general way to proceed

RaulPPelaez commented 9 months ago

The documentation also requires the addition of a section like "Dataset standardisation". I will add it in the PR for this, will defo need yo to pour your knowledge into it.

guillemsimeon commented 9 months ago

I will be happy to help