Open rolly-ng opened 6 years ago
Thank you for the report! At a first look, this looks like a problem only occurring when ELPA is incorporated. I may step back from ELPA as a default with Xconfigure, or find a version that works again.
Hi Hans, I have done some further tests and found that the -D__NON_BLOCKING_SCATTER in QE make.inc creates the problem. I have compiled ELPA as instructed, then remove this parameter in QE make.inc. The v6.3 runs, but I have to make use of pw.x -nk 2 to speed up the parallel speed. Otherwise, 2 nodes runs slower then 1 node on the AUSURF112 benchmark. Not sure if -nk 2 can fix the problem? Thanks, Rolly
Hi, I have parallel studio 2017 update 7 and I have successfully compiled ELPA 2017.11.001 then QE v6.3 via the configure-xxx-hsw.sh script. It is okay when I try to run QE v6.3 on a single node in the cluster, i.e. srun -p ABC -N 1 -n 176 pw.x < my.in > my.out. However, once I try over 2 nodes, i.e. srun -p ABC -N 2 -n 352 pw.x < my.in > my.out, it produces the strange "Error in routine cdiaghg problems computing cholesky" error. If I compile ELPA and QE with configure-xxx-hsw-omp.sh script, it is also okay for single node. However, if 2 nodes, it produces "PMPI_Group_incl: Invalid rank, error stack:" message in the slurm-xxx.out Could you please have a look at QE v6.3?
Moreover, conventional compilation without xconfigure run okay across multi nodes, i.e. ./configure CC=icc CXX=icpc F77=ifort F90=ifort MPIF90=mpiifort --enable-shared --enable-parallel --disable-openmp --with-scalapack=intel CFLAGS="-O3 -I -xCORE-AVX2" CXXFLAGS="-O3 -I -xCORE-AVX2" FCFLAGS="-O3 -I -xCORE-AVX2" F90FLAGS="-O3 -I -xCORE-AVX2" FFLAGS="-O3 -I -xCORE-AVX2
Thanks, Rolly