I have tested the GPU version by adapting the ex5f_cptr.f source code to import a linear system from my own application. The index of unknowns start from 1 rather than 0. The single GPU version works fine. But multi-GPU version fails to converge or even crash. No problem with CPU version. Multi-GPU version works fine as well if I change the index such that it start from 0. I believe that there is a bug in MPI communication in the GPU version when index start from 1. The MPI version used in my testing is the OpenMPI 3.1 version provided in the NVIDIA HPC SDK package. Hypre is configured with
I have tested the GPU version by adapting the ex5f_cptr.f source code to import a linear system from my own application. The index of unknowns start from 1 rather than 0. The single GPU version works fine. But multi-GPU version fails to converge or even crash. No problem with CPU version. Multi-GPU version works fine as well if I change the index such that it start from 0. I believe that there is a bug in MPI communication in the GPU version when index start from 1. The MPI version used in my testing is the OpenMPI 3.1 version provided in the NVIDIA HPC SDK package. Hypre is configured with
--enable-shared --with-cuda '--with-gpu-arch=60 70 80' --with-cuda-home=/usr/local/cuda-11.8 --enable-cusparse --enable-c ublas --enable-curand --enable-unified-memory