lab-cosmo / metatrain

Training and evaluating machine learning models for atomistic systems.
https://lab-cosmo.github.io/metatrain/
BSD 3-Clause "New" or "Revised" License
13 stars 3 forks source link

PET & Alchemical-model produce different results with same random seed #211

Open HannaTuerk opened 1 month ago

HannaTuerk commented 1 month ago

Hi, @M-R-Schaefer and I trained metatensor-models for PET and Alchemical-models with the same random seed on different machines.

PET: it produced different models (performance similar but RMSE of the energy differs around 0.3 eV ). Rerunning a training from the same machine also produces different models. My pet version (pulled 28.5.2024, with CUDA_DETERMINISTIC: True the training differences are at around 10**-7 for some energy traiininigs. (I did 2 trainings with same random seeds produce the same training (only 2 epochs to test)).

Alchemical-models: It produces the same result on the same machine, but on different machines with the same random seed the output is not reproducible (resulting models are different).

We also tried soap-bpnn and gap, for both the trainings were reproducible for different runs on the same and on different machines (+Moritz and my laptop).

frostedoyster commented 1 month ago

Thank you! We'll be working on it