merzlab / QUICK

QUICK: A GPU-enabled ab intio quantum chemistry software package
Mozilla Public License 2.0
157 stars 43 forks source link

Geometry optimization fails in multi-GPU version #281

Open Madu86 opened 1 year ago

Madu86 commented 1 year ago

The geometry optimization in multi-GPU version of the latest version fails for some reason. See the attached .zip file for CUDA serial and multi-GPU output files of a test case. 1077.out.zip

akhilshajan commented 1 year ago

Hi @Madu86, I tried out the calculations with quick.MPI and the problem still persist. You can find the outputs from all cuda, cuda.MPI and MPI calculations here.

Madu86 commented 1 year ago

Hi @akhilshajan, can you please provide me a smaller example that can be used to reproduce this issue? The one I have (attached above) is too big for debugging.

akhilshajan commented 1 year ago

Hi @Madu86, I have tried out few examples which takes ~40 iterations with serial, MPI and cuda.MPI and it works out fine giving same results. I was not able to find something that would be helpful for debugging. I am still working on few other molecules there is some discrepancy I will update you.

Madu86 commented 1 year ago

@akhilshajan Any update on this?

akhilshajan commented 1 year ago

Hi @Madu86, I apologize for my delayed response. I have tried out some molecules and it appears that the issue we are experiencing with MPI arises when using the D3BJ keyword. I did not encounter any errors when running the calculations without this keyword. Attached is the input file I used to test benzene molecule, where the SCF calculation failed.

akhilshajan commented 3 months ago

Hi @Madu86, I tried to run this test case with modifications to the MPI we made for DL-Find just to confirm if the cause was with MPI. I see some discrepancies in the results as I ran this system on single CPU(took 53 iterations), CUDA(122 iterations) and multi-GPU(still fails!!). I have attached my results including the old results shared by you. I have also attached the slurm out file for multi-CUDA calculation.

1077.out.zip