Reducing memory requirement/parallelization for cluster molecules with only 1 k-point

shivang693 commented 4 years ago

Hello,

I am trying to simulate a cluster molecule for geometrical relaxation and solvation. However, when I run it on computing clusters, it often aborts because it asks for too much memory (>200G, which is way too much) and returns a segmentation fault. But since it is a cluster molecule, the number of reduced k-points is only 1 and on the website, I read that parallelization is done only over the k-points. Thus, I was wondering how one would go about parallelizing this to reduce memory requirements.

Attached is the input file, the job submission script used, and the error file thus generated.

Best, Shivang

common.in.txt LinearPCM.in.txt solvation_veryslow.sh.txt stderr.3644901.txt

abhiShandy commented 4 years ago

What's your system size? How many electrons are there in your molecule? The required memory is a function of both number of bands and number of states (or kpoints)

shivang693 commented 4 years ago

Thank you for the quick reply. The system has 13 atoms and a total of 88 electrons (44x2), and sits inside a cube of size 55 Bohr units. However, this is just a preliminary run and my system size will double, triple, and quadruple at the very least (the box size will remain more or less the same). Thus, I wanted to know how I can parallelize it using the mpirun command.

abhiShandy commented 4 years ago

For highest efficiency: Number of processes should not exceed number of reduced kpoints (1 in your case), and highest number of threads per core (whatever your node allows). So, something like mpirun -np 1 jdftx -c 1 -i LinearPCM.in should be a good starting point.

shivang693 commented 4 years ago

Thank you, I will try that and see how it goes.

shivang693 commented 4 years ago

Hi, I tried running as suggested by you by using mpirun -np 1 jdftx -c 36 -i monomer.in -o monomer.out. However, my example, which has only 44 Kohn-Sham states, takes over 2 days to complete. This can be seen from the monomer.out output file. The corresponding input files (monomer.in, monomer.xyz, monomer.lattice) have also been attached. I believe the monomer.ionpos file has been modified as per the output. Could you tell me why this takes so long to run? The same example on Quantum ESPRESSO takes just 2 hours.

Also, when I tried LinearPCM on the output generated from the above example by doing mpirun -np 1 jdftx -c 36 -i LinearPCM.in -o LinearPCM.out, the solvation took well over 4 days. In fact, the job got terminated after 4 days since I was not allowed more time on the cluster. Why is it that this takes so much time? The input files LinearPCM.in, common.in and the truncated output file LinearPCM.out have also been attached.

Am I missing something? Please do let me know. The job submission file solvation.sh has also been attached for reference.

common.in.txt LinearPCM.in.txt LinearPCM.out.txt monomer.ionpos.txt monomer.lattice.txt monomer.out.txt solvation.sh.txt

shankar1729 commented 4 years ago

Hi Shivang,

It seems that you most likely have a core binding issue: all the threads are being forced to run on a single core rather than spread out to use all allocated cores. Unfortunately, many MPI implementations do this by default instead of supporting what is required for a hybrid mpi-threads run. Try using "mpirun --bind-to none" if you have openMPI. If you still have issues, try using utilitieslike cpubind to control core binding.

Just for reference, my 4-core laptop ran your calculation about 5 times faster than the output file you posted, so it clearly must be a parallelization issue.

Also, in addition to all of the above, use coulomb-truncation to substantially cut down the unit cell size. That can also gain you a 3x reduction in memory and run time.

Best, Shankar

shivang693 commented 4 years ago

Hi Shankar,

Thank you for letting me know. I was not aware of that. I will try what you suggested and see how it goes.

shankar1729 / jdftx

Reducing memory requirement/parallelization for cluster molecules with only 1 k-point #126