shankar1729 / jdftx

JDFTx: software for joint density functional theory
http://jdftx.org
79 stars 49 forks source link

jdftx_gpu CUDA Error: out of memory #338

Open zhbzhbzhbzhb opened 2 weeks ago

zhbzhbzhbzhb commented 2 weeks ago

Hello: My system has about 280 atoms. When I use jdftx_gpu (1.7.0 version) on a NVIDIA A100 with 40G memory, there is always error: CUDA Error: out of memory. Are there any keywords that can be adjusted to reduce memory usage, or are there any other suggestions? This is my input: dump-name $VAR initial-state $VAR elec-cutoff 15 60 elec-ex-corr gga-PBE core-overlap-check None kpoint-folding 1 1 1 elec-smearing Gauss 0.0018 fluid LinearPCM pcm-variant CANDLE fluid-solvent H2O fluid-cation K+ 1. fluid-anion Cl- 1. van-der-waals D3 electronic-minimize nIterations 100 energyDiffThreshold 1e-08 target-mu -0.17125175477924692 spintype z-spin ion-species GBRV/$ID_pbe.uspp

coulomb-interaction periodic dump End State dump End Forces dump End Ecomponents

shankar1729 commented 2 weeks ago

This will come down to the volume of your unit cell as well. Post the log, or the snippet that contains the lines with number of basis functions and nElectrons, nBands etc.

zhbzhbzhbzhb commented 2 weeks ago

This is the information containing the lines with number of basis functions and nElectrons, nBands. ---------- Setting up k-points, bands, fillings ---------- No reducable k-points. Computing the number of bands and number of electrons Calculating initial fillings. nElectrons: 1018.000000 nBands: 1004 nStates: 2

----- Setting up reduced wavefunction bases (one per k-point) ----- average nbasis = 331301.000 , ideal nbasis = 331144.756 496,3 12% ----- Setting up reduced wavefunction bases (one per k-point) ----- average nbasis = 331301.000 , ideal nbasis = 331144.756

zhbzhbzhbzhb commented 2 weeks ago

I am currently trying to perform computations in parallel on two cards. However, since our GPUs do not have a Slurm queue management system, I am currently running the command with export CUDA_VISIBLE_DEVICES=1,2 followed by mpirun -np 2 jdftx_gpu -i in -o out. However, when I look at the out file inside jdftx, it appears that the computation is still running on a single GPU. I am currently not very familiar with GPU usage. How can I invoke two GPUs to perform computations with jdftx in this situation?

shankar1729 commented 2 weeks ago

Your wavefunctions need about 5 GB to store one copy per k-point/spin. The minimization/SCF algorithms need 5-6 copies, meaning around 30 GB per k-point/spin. Consequently, it should just about fit on one GPU/nStates, but it will not fit on a single GPU.

You can only use one GPU/MPI process, so the two GPU case you mentioned should work correctly. How many A100s does your node have? If it is exactly 2, then maybe the issue is that you specified 1,2, instead of 0,1 (nvidia device numbering is zero-based).

If the situation is still close in memory after getting it onto two GPUs, I'd suggest trying to reduce the vacuum space in your cell. JDFTx's coulomb truncation methods get rid of the large spacing needed in other code (even with techniques elsewhere like dipole corrections).

zhbzhbzhbzhb commented 2 weeks ago

Our node has 4 A100s. So export CUDA_VISIBLE_DEVICES=1,2 would work. I will try to reduce the vacuum space to do the calcuation. Thanks for your help!