Open lcebaman opened 1 year ago
When running on more than 1 GPU (4 in the example here), I can see entries per each additional GPU:
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
mpirun -np 4 ./wrapper.sh Benchmark_ITT --mpi 1.1.1.4 $cat wrapper.sh #!/bin/bash lrank=$OMPI_COMM_WORLD_LOCAL_RANK export OMP_NUM_THREADS=1 case ${lrank} in [0]) GPU=0 CPUBIND="0-19" ;; [1]) GPU=1 CPUBIND="20-39" ;; [2]) GPU=2 CPUBIND="40-59" ;; [3]) GPU=3 CPUBIND="50-79" ;; esac CMD="env CUDA_VISIBLE_DEVICES=${GPU} numactl --physcpubind=${CPUBIND}" echo "$CMD $@" $CMD $@
Intel (40 cores/node) + 4xA100
../configure --enable-comms=mpi \ --enable-simd=GPU \ --enable-accelerator=cuda \ --prefix $prefix \ CXX=nvcc \ LDFLAGS=-L$prefix/lib/ \ CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -I$prefix/include/ -std=c++14"
Describe the issue:
When running on more than 1 GPU (4 in the example here), I can see entries per each additional GPU:
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
Code example:
Target platform:
Intel (40 cores/node) + 4xA100
Configure options: