Open cparrott73 opened 5 years ago
Hi, Chris.
Is it possible that openmpi was built on a system where there was a valid /usr/lib64/libcuda.so.1
and then there is an attempt to use the library on another machine where libcuda library isn't present?
The following lines of code are responsible for the messages you're seeing.
https://github.com/open-mpi/ompi/blob/master/opal/mca/common/cuda/common_cuda.c#L386 https://github.com/open-mpi/ompi/blob/master/opal/mca/common/cuda/common_cuda.c#L435 https://github.com/open-mpi/ompi/blob/master/opal/mca/common/cuda/help-mpi-common-cuda.txt#L164
I'm not sure if it is a red herring though. Do you say this because cuda-aware MPI works as expected?
Hi, Chris.
Is it possible that openmpi was built on a system where there was a valid
/usr/lib64/libcuda.so.1
and then there is an attempt to use the library on another machine where libcuda library isn't present?The following lines of code are responsible for the messages you're seeing.
https://github.com/open-mpi/ompi/blob/master/opal/mca/common/cuda/common_cuda.c#L386 https://github.com/open-mpi/ompi/blob/master/opal/mca/common/cuda/common_cuda.c#L435 https://github.com/open-mpi/ompi/blob/master/opal/mca/common/cuda/help-mpi-common-cuda.txt#L164
I'm not sure if it is a red herring though. Do you say this because cuda-aware MPI works as expected?
Hi Akshay,.
Yes, that is correct. This Open MPI build was linked against CUDA on our build system, and then installed to a NFS directory where it can be shared among various systems on our network. Some of the systems have GPUs and CUDA installed, while others do not. Obviously we do not see this warning on the CUDA-enabled systems, just the non-CUDA ones.
Thank you for taking the time to submit an issue!
.dylib warning from dlopen() for libcuda on linux
Open MPI v3.1.3
Open MPI was compiled with PGI 19.1 compilers from a source tarball downloaded from open-mpi.org. Open MPI is configured with CUDA support via the --with-cuda flag to ./configure.
Please describe the system on which you are running
Details of the problem
We have a user who is unhappy about the fact that Open MPI prints a warning about being unable to dlopen() libcuda.dylib on a Linux system when the runtime is trying to open the CUDA shared library. This can be seen in the following output:
I do understand why this is happening: Open MPI does not have a notion of whether it should dlopen() a file ending in .dylib or .so on the system, so it tries them in succession until one succeeds, or it exhausts all possibilities. These messages are coming out of dlopen(). In essence, this is really nothing more than a cosmetic issue. However, the user thinks this is a 'red herring' which might be confusing to users on a Linux system.
Would you consider perhaps a small change to test whether the file exists before trying to dlopen() it? That might help clear up this overall warning message a little bit. Though I would also understand if you chose not to, as I do appreciate that this is mainly a cosmetic issue.
Thanks in advance.