Hi,
@M-R-Schaefer and I trained metatensor-models for PET and Alchemical-models with the same random seed on different machines.
PET: it produced different models (performance similar but RMSE of the energy differs around 0.3 eV ). Rerunning a training from the same machine also produces different models.
My pet version (pulled 28.5.2024, with CUDA_DETERMINISTIC: True the training differences are at around 10**-7 for some energy traiininigs. (I did 2 trainings with same random seeds produce the same training (only 2 epochs to test)).
Alchemical-models: It produces the same result on the same machine, but on different machines with the same random seed the output is not reproducible (resulting models are different).
We also tried soap-bpnn and gap, for both the trainings were reproducible for different runs on the same and on different machines (+Moritz and my laptop).
Hi, @M-R-Schaefer and I trained metatensor-models for PET and Alchemical-models with the same random seed on different machines.
PET: it produced different models (performance similar but RMSE of the energy differs around 0.3 eV ). Rerunning a training from the same machine also produces different models. My pet version (pulled 28.5.2024, with CUDA_DETERMINISTIC: True the training differences are at around 10**-7 for some energy traiininigs. (I did 2 trainings with same random seeds produce the same training (only 2 epochs to test)).
Alchemical-models: It produces the same result on the same machine, but on different machines with the same random seed the output is not reproducible (resulting models are different).
We also tried soap-bpnn and gap, for both the trainings were reproducible for different runs on the same and on different machines (+Moritz and my laptop).