Closed siddharthal closed 6 years ago
Back propagation starts from the gradient of the loss function and back propagates as it with any basic neural network model. The chain rule gives dC/dE_t * dE_i/dw (for E_i from E_t=sum(E_i)) which takes back propagation into the individual networks. Gradients are reduced over atoms which share the same atomic symbol. The individual contributions are a bi-product of this process and are not fit to directly. These values have no physically analogous value, and therefore represent the partitioning of the energy that the training process converged to. These may or may not be consistent from trained model to trained model. Let me know if this needs a better explanation.
Thanks a lot. That clarifies.
The paper describes an example of H2O, where the total energy is sum of individual contributions obtained from the two hydrogens and an oxygen. Could you clarify a bit on how the loss is propagated backwards when the model used for the two hydrogens is the same, as only the total energy is available and the individual contributions are unknown while training the model ?