paboyle / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
155 stars 111 forks source link

Cuda error invalid device ordinal #443

Open lcebaman opened 1 year ago

lcebaman commented 1 year ago

Describe the issue:

When running on more than 1 GPU (4 in the example here), I can see entries per each additional GPU:

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Code example:

mpirun -np 4 ./wrapper.sh Benchmark_ITT --mpi 1.1.1.4
$cat wrapper.sh
#!/bin/bash
lrank=$OMPI_COMM_WORLD_LOCAL_RANK   

export OMP_NUM_THREADS=1                                                                                                                                                                  
case ${lrank} in                                                                                                                                                                          
    [0])                                                                                                                                                                                  
        GPU=0                                                                                                                                                                             
        CPUBIND="0-19"                                                                                                                                                                    
        ;;                                                                                                                                                                                
    [1])                                                                                                                                                                                  
        GPU=1                                                                                                                                                                             
        CPUBIND="20-39"                                                                                                                                                                   
        ;;                                                                                                                                                                                
    [2])                                                                                                                                                                                  
        GPU=2                                                                                                                                                                             
        CPUBIND="40-59"                                                                                                                                                                   
        ;;                                                                                                                                                                                
    [3])                                                                                                                                                                                  
        GPU=3                                                                                                                                                                             
        CPUBIND="50-79"                                                                                                                                                                   
        ;;                                                                                                                                                                                
esac                                                                                                                                                                                      

CMD="env CUDA_VISIBLE_DEVICES=${GPU} numactl --physcpubind=${CPUBIND}"                                                                                                                    
echo "$CMD $@"                                                                                                                                                                            

$CMD $@

Target platform:

Intel (40 cores/node) + 4xA100

Configure options:

../configure --enable-comms=mpi          \
             --enable-simd=GPU           \
             --enable-accelerator=cuda   \
             --prefix $prefix       \
             CXX=nvcc                    \
             LDFLAGS=-L$prefix/lib/ \
            CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -I$prefix/include/ -std=c++14"