zhangylch / REANN

MIT License
54 stars 17 forks source link

get_neigh puzzle #9

Closed IZugec closed 5 months ago

IZugec commented 6 months ago

Dear developers,

I have two questions regarding REANN PES during inference time. Pre-trained REANN model expects coordinates of atoms, neighlist, shifts, and species (as per https://github.com/zhangylch/REANN/blob/main/reann/ASE/calculators/reann.py). In order to find neighborhood list, and shifts, Fortran routine get_neigh.f90 is employed, which also outputs coordinates of atoms (if I understand correctly, it wraps all the atoms within one cell in case some atoms during MD go out of unit cell?)

1) I have a pre-trained model which I loaded in a script similar to ASE/ase_reann.py (everything needed to replicate is in python.zip ). I calculated distances of first atom to all other atoms (print modifications in calculators/reann.py) for coordinates before and after get_neigh. If initial coordinates are all within unit cell (which they are for this specific example), I would not expect the distances to be different before and after get_neigh, but they seem to be different for some atoms. Could you please help me understand why this happens?

Output on my end for python: https://paste.ofcode.org/3fR6fYRG3KstbmPqhTBPna

2) I would like to use get_neigh routine in order to get coords, neighlist, shifts and number of neighbors in Fortran so I concatenated all three f90 files from ASE/fortran-neigh into one get_neigh.f90. I call these routines from main.f90 (everything needed to replicate is in fortran.zip ). Eventhough I use, what seems to me, identical input parameters for get_neigh subroutine I seem to be getting different result than when I use those same subroutines within ASE/calculators/reann.py. Namely, I get different number of neighbors, coor array, etc. I tried compiling with GCC/10.3.0 as well as newer version like GCC/12.3.0, as well as imkl/2021 and imkl/2023. Do you maybe have an idea why would I be getting different results? I imagine it's most likely due to difference in compilation so I was hoping if you could share some thoughts or advice how to properly compile or solve the problem.

Output on my end for fortran: https://paste.ofcode.org/tV8hEfjKenarDiEPVe4w3G

Thanks in advance, Ivan

zhangylch commented 6 months ago

Dear developers,

I have two questions regarding REANN PES during inference time. Pre-trained REANN model expects coordinates of atoms, neighlist, shifts, and species (as per https://github.com/zhangylch/REANN/blob/main/reann/ASE/calculators/reann.py). In order to find neighborhood list, and shifts, Fortran routine get_neigh.f90 is employed, which also outputs coordinates of atoms (if I understand correctly, it wraps all the atoms within one cell in case some atoms during MD go out of unit cell?)

  1. I have a pre-trained model which I loaded in a script similar to ASE/ase_reann.py (everything needed to replicate is in python.zip ). I calculated distances of first atom to all other atoms (print modifications in calculators/reann.py) for coordinates before and after get_neigh. If initial coordinates are all within unit cell (which they are for this specific example), I would not expect the distances to be different before and after get_neigh, but they seem to be different for some atoms. Could you please help me understand why this happens?

Output on my end for python: https://paste.ofcode.org/3fR6fYRG3KstbmPqhTBPna

  1. I would like to use get_neigh routine in order to get coords, neighlist, shifts and number of neighbors in Fortran so I concatenated all three f90 files from ASE/fortran-neigh into one get_neigh.f90. I call these routines from main.f90 (everything needed to replicate is in fortran.zip ). Eventhough I use, what seems to me, identical input parameters for get_neigh subroutine I seem to be getting different result than when I use those same subroutines within ASE/calculators/reann.py. Namely, I get different number of neighbors, coor array, etc. I tried compiling with GCC/10.3.0 as well as newer version like GCC/12.3.0, as well as imkl/2021 and imkl/2023. Do you maybe have an idea why would I be getting different results? I imagine it's most likely due to difference in compilation so I was hoping if you could share some thoughts or advice how to properly compile or solve the problem.

Output on my end for fortran: https://paste.ofcode.org/tV8hEfjKenarDiEPVe4w3G

Thanks in advance, Ivan

I think the problem is due to the problem of the neigh atoms is not only the origial atom in the unit cell. It is the all the ghost atoms and original atom in its perodic image and original cell within the cutoff. As shown in the figure, image.

zhangylch commented 6 months ago

By the way, you can reder to the cell-linked algorithm and I believe it will solve all your puzzle.

IZugec commented 6 months ago

I think the problem is due to the problem of the neigh atoms is not only the origial atom in the unit cell. It is the all the ghost atoms and original atom in its perodic image and original cell within the cutoff. As shown in the figure,

Thank you for the reply! However, I don't think I have been clear enough with my question. In get_neigh subroutine there are two arrays concerning coordinates. One is called "cart" and it's an input array of a structure, and "coor" which is an array given as an output. Furthermore, if I understand correctly, "coor" is an array which is finally used (after transposing it) as an input to the model itself (in self.pes()), therefore is it correct to expect that the relative distances of atoms in array "cart" and "coor" should be the same?

I do understand that from "cart" all images of atoms are created so as to create atomindex list which tells you neighbors of each atom, but I am specifically asking about differences between "cart" and "coor" arrays given that coor is in the end used in forward pass during inference.

By the way, you can reder to the cell-linked algorithm and I believe it will solve all your puzzle.

Could you please provide a link or a name of a script you mean? Do you mean get_neigh.py from inference? "Problem" with that is that I would like to run this routine in Fortran as I would like to interface this newest version of REANN with Fortran anyway (using libtorch in C++, and C)

zhangylch commented 6 months ago

I think the problem is due to the problem of the neigh atoms is not only the origial atom in the unit cell. It is the all the ghost atoms and original atom in its perodic image and original cell within the cutoff. As shown in the figure,

Thank you for the reply! However, I don't think I have been clear enough with my question. In get_neigh subroutine there are two arrays concerning coordinates. One is called "cart" and it's an input array of a structure, and "coor" which is an array given as an output. Furthermore, if I understand correctly, "coor" is an array which is finally used (after transposing it) as an input to the model itself (in self.pes()), therefore is it correct to expect that the relative distances of atoms in array "cart" and "coor" should be the same?

I do understand that from "cart" all images of atoms are created so as to create atomindex list which tells you neighbors of each atom, but I am specifically asking about differences between "cart" and "coor" arrays given that coor is in the end used in forward pass during inference. The coordinates before and after get_neigh are such that I move the other atoms to the periodic image closest to the first atom as shown in the following figure. image

And the following neighlist and shifts is based on the coor.

By the way, you can reder to the cell-linked algorithm and I believe it will solve all your puzzle.

Could you please provide a link or a name of a script you mean? Do you mean get_neigh.py from inference? "Problem" with that is that I would like to run this routine in Fortran as I would like to interface this newest version of REANN with Fortran anyway (using libtorch in C++, and C)

Cell-linked is algorithm implemented in the get_neigh to serach neighlist. You can find it by seraching on the internet.