Open spozdn opened 1 week ago
just to add to this I try to "profile" with print statement the lammps interface the most time spend ( ~2.2 secs per iterations) is in line 502 of pair_metatensor.cpp which is the Torch backward. According to @spozdn and @Luthaf the fault is the complexity of the graph
@spozdn @DavideTisi There is another issue, discovered some time ago by @Luthaf, @frostedoyster and me, and related to a non-vectorised CPU-GPU data transfer implementation in models' backward. I'm talking about this https://github.com/lab-cosmo/metatensor/pull/636 PR. As far as I can see, it should be already available in the latest metatensor-torch release. @Luthaf do we have the fixed commit tag in the LAMMPS metatensor-torch dependency, so https://github.com/lab-cosmo/metatensor/pull/636 is not available in LAMMPS?
So this overhead can come from multiple places, and only one of them is inside metatrain:
MetatensorAtomisticModel
, which can be disabled with check_consistency=False
The timings @abmazitov gave me a while ago where around:
I am slowly looking into the NL conversion step, if other people want to look into it I'm happy to explain the code!
For Davide's results, something else could be happening here. I could try to add some code to print the number of nodes in the computational graph inside LAMMPS, but if this is the bottleneck the fix would have to come from changes in 1. or PET itself.
@Luthaf do we have the fixed commit tag in the LAMMPS metatensor-torch dependency
Yes, I recently updated this to pull the latest release of metatensor-torch in LAMMPS. If you believe this is the issue you can try to build this commit https://github.com/lab-cosmo/lammps/commit/60ff741ee7d60644c8bd9642952e71137c8b6b72
As investigated with @DavideTisi, for his system, intrinsic time of PET (energies and forces) is 7.8e-5 seconds/atom on a V100 gpu on IZAR. Simultaneously, on the same node, the time in LAMMPS is about 3.1e-3 seconds/atom. So, LAMMPS MD is about 40 !!! times slower.
@abmazitov last time we touched this, I got the impression, that the current overhead of LAMMPS is about 1.5 times, 2 times tops, not 40 if using https://github.com/spozdn/pet/blob/neighbors_convert_cpp/src/neighbors_convert.cpp. @DavideTisi though, told me that he was using right versions for both PET and metatrain. So, @abmazitov, could you take a look at this?
upd. the number of atoms in supercell is 960