paboyle / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
154 stars 110 forks source link

Having trouble on Cori GPU nodes at NERSC #228

Open stevengottlieb opened 5 years ago

stevengottlieb commented 5 years ago

I am trying to use the feature/gpu branch to test Grid at NERSC on the Cori GPU nodes. I have been following directions from Patrick Steinbrecher from May, but needed to do some modifications. Patrick suggested trying this version of the code: git checkout remotes/origin/feature/gpu-port git checkout 60330e05a37fbc8ce710e1caf0bf40c13cb1430b

I was able to get that to compile with some warnings. I then wanted to move to the lastest version and see if I could get that to compile, but I cannot get through the configuration.

steven@cori04:~/cori/gpu/OpenMPI.2/Grid> git status HEAD detached at origin/feature/gpu-port Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory)

    modified:   configure.ac

According to Patrick's instructions, line 477 of configure.ac should be removed: sed -e '477d' configure.ac > tmp.ac mv tmp.ac configure.ac

Here are my modules: steven@cgpu12:~/cori/gpu/OpenMPI.2/Grid/build> module list Currently Loaded Modulefiles: 1) esslurm 3) cuda/10.1.168 2) gcc/7.3.0 4) openmpi/4.0.1-ucx-1.6

I using this configure command: ../configure --enable-precision=single --enable-simd=VGPU --enable-comms=mpi-auto CXX=nvcc MPICXX=mpic++ --enable-gen-simd-width=32 --with-lime=$griddir/build/lime/install/ --with-mpfr=/global/common/sw/cray/cnl7/haswell/mpfr/4.0.1/gcc/8.2.0/ddnjrzc/ CXXFLAGS="-std=c++11 -arch=sm_70 -gencode=arch=compute_70,code=compute_70 -I$OPENMPI_DIR/include/ -L$OPENMPI_DIR/lib -lmpi"

I get this error: checking for library containing SHA256_Init... -lcrypto checking for openssl/sha.h... yes checking for library containing crc32... -lz checking for library containing move_pages... -lnuma checking for library containing H5Fopen... no configure: error: "SIMD option VGPU not supported by the GCC/Clang compiler"

I ran diff on configure and the version of configure that corresponded to the version of gpu-port that Patrick recommended. Here is the result: steven@cori04:~/cori/gpu/OpenMPI.2/Grid> diff configure.ac ~/cori/gpu/OpenMPI/Grid
300a301,303

    AC_DEFINE([GPU],[1],[GPU float4 vectors])
  SIMD_FLAGS='';;
  VGPU)

358a362,364 AC_DEFINE([GPU],[1],[GPU float4 vectors]) SIMD_FLAGS='';; VGPU) 471,473d476 < echo MPI_CXXFLAGS $MPI_CXXFLAGS < echo MPI_CXXLDFLAGS $MPI_CXXLDFLAGS < echo MPI_CFLAGS $MPI_CFLAGS 476a480 LIBS="echo $MPI_CXXLDFLAGS | sed -E 's/-L@<:@^ @:>@+//g' $LIBS";; 616d619 < AC_CONFIG_FILES(HMC/Makefile)

I then copied configure from the branch recommended by Patrick to the current version of gpu-port and now I can configure.

It would be nice if this could be fixed so that the lastest version of gpu-port can be configured on cori without the fuss.

All suggestions/comments welcome! Steve

paboyle commented 5 years ago

Will take a look Steve. I've committed feature/gpu-port back into develop as of last night.

gfilaci commented 5 years ago

Hi Steve, maybe I know where this problem comes from: with the commit fa9cd50c5b3ba1fd74c22866f0073a4721130400, the file Grid/simd/Grid_gpu.h has been removed, and there is no choice between GPU and VGPU targets any more. Now there is only one target (GPU) that uses the vectorisation in Grid/simd/Grid_gpu_vec.h. So if you compile with --enable-simd=GPU (which is the same as the “old VGPU target”) the error during configuration should disappear. Gianluca

stevengottlieb commented 5 years ago

Hi Gianluca,

Thanks for this explanation. It is very helpful.

Sincerely, Steve

On Mon, 2019-08-19 at 01:05 -0700, gfilaci wrote:

Hi Steve, maybe I know where this problem comes from: with the commit fa9cd50, the file Grid/simd/Grid_gpu.h has been removed, and there is no choice between GPU and VGPU targets any more. Now there is only one target (GPU) that uses the vectorisation in Grid/simd/Grid_gpu_vec.h. So if you compile with --enable-simd=GPU (which is the same as the “old VGPU target”) the error during configuration should disappear. Gianluca

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

paboyle commented 4 years ago

Instructions two GPU systems online at: https://github.com/paboyle/Grid/wiki