west-code-development / West

WEST code
https://west-code.org
GNU General Public License v3.0
17 stars 8 forks source link

Segmentation fault using gpu version for large system #7

Open ShiminZhang21 opened 3 months ago

ShiminZhang21 commented 3 months ago

Hi west team,

I been having a problem on running gpu version of west wstat.x on NERSC Perlmutter . I have no problem when running a small system, but I can’t run through a single calculation for big systems. No matter what parallel setting I tried, it crash at certain point with segmentation fault.

I attached my test file for a ZnO 192 atoms supercell with different parallel using npdep=2496. "Compile_west_gpu_v1.sh “ and "Compile_west_gpu_v2.sh “ are two compilation script I tried. “ZnO_wstat2496/Nni_” are the parallel tests with N=number of gpus, ni = -ni parallel setting for wstat.x “ZnO_wstat_2496/slurm.out.reports” is the report of slurm error message. When there’s no memory issue, the segmentation fault problem always appear. “ZnO_wstat_2496/wstat.out.reports” is the report of where the wstat.out end at. Some end at starting , some end at 70%.

Beside the ZnO 4x4x3 supercell, I also tested other systems like 161 atoms VB- in hBN. The similar issue appears.

Do you have any idea on solving this problem?

Seg_fault.zip

vyu16 commented 3 months ago

@ShiminZhang21 We believe this is a problem of the NVIDIA Fortran compiler, or its optimizer to be precise. We have implemented a workaround to be released in the next version of WEST. For now you can recompile the code with reduced optimization -O1. In our tests, the segmentation fault only occurs with -O2 or higher.

ShiminZhang21 commented 3 months ago

Thank you so much for the suggestion!

If i understand correctly i should add -O1 to CUDA_F90FLAGS in make.inc of QE? should i also add this option to LDFLAGS ?

vyu16 commented 3 months ago

Just search for -fast then replace with -O1.

ShiminZhang21 commented 3 months ago

Thank you! i have tested the setting and it can run through the large system now. Although it seems significantly slower with lower levels of optimization, but it is expected.

ShiminZhang21 commented 3 months ago

@vyu16
Hi victor,

i can run through the wstat.x for the dielectric using the compilation you suggested before. however when i try to run the gw using wfreq.x i got this error message related to FFT:

Failing in Thread:1 Accelerator Fatal Error: call to cuStreamSynchronize returned error 700: Illegal address during kernel execution File: /global/common/software/m4507/szhan213/WEST_GPU/qe-7.3-west6.0_compile2/West/FFT_kernel/fft_at_k.f90 Function: single_invfft_k:32 Line: 64

I use the same parallel setting as wstat.x .
Is there any suggestion what could possible help me figure out what is the problem?

Thanks, Shimin

vyu16 commented 3 months ago

Can you share the input and output files and the job script please

ShiminZhang21 commented 3 months ago

Sure i attached the the job script, input, outputs. I just realized my qp_bandrange is out of the range of scf calculated bands. That could be the problem so i'm re testing it.
NV.zip

vyu16 commented 3 months ago

Yes that would have led to a crash. In the next release the code will catch such errors.