Closed eliekozah-cornelisnetworks closed 1 week ago
Right now OMPI does not support reduce_local with GPU buffers. It end up calling an MPI_Op that does not have access to the GPU buffers and therefore segfaults.
https://github.com/open-mpi/ompi/pull/12569 is needed to make this work.
Duplicate of https://github.com/open-mpi/ompi/issues/12045
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
Version 5.0.3 with CUDA enhancements.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Open MPI was installed from a source distribution tarball, customized with CUDA support for GPU capabilities.
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.NA
Please describe the system on which you are running
Operating System/Version: Operating System: Red Hat Enterprise Linux release 8.6 (Ootpa)
Computer Hardware: Architecture: x86_64 CPU: AMD EPYC 7252 8-Core Processor, 16 CPUs online, with each core operating at a frequency of 3048.274 MHz. Memory: 127863 MB total, with 115327 MB free.
Ethernet (eth0): Speed: 1000Mb/s
Details of the problem
I am encountering a segmentation fault when running the Reduce_local operation within the IMB-MPI1-GPU benchmark, specifically when using OpenMPI version 5.0.3 with CUDA. The fault occurs regardless of whether GDRcopy is enabled or not, and both in single and multi-GPU configurations.
Steps to Reproduce
GDB Output During the execution under GDB, the program crashes with the following backtrace pointing to an issue within the AVX-optimized operation for floating point addition: