Closed teonnik closed 6 years ago
The issue does not pertain to OpenMPI, it's rather due to incorrect cluster configuration.
We are getting this same error message - can you please tell me how you determined what was wrong with the cluster configuration? We are also using LSF.
Background information
Version
OpenMPI 3.0.0 with CUDA support. I don't know exactly how OpenMPI was installed, I am not the system administrator, I will let you know as soon as I find out.
System
18 IBM S822LC servers ("Minksky") each with
All nodes are connected to a single Mellanox InfiniBand EDR switch.
Details of the problem
I have a C++14 code using CUDA Thrust (the CUDA part is C++11). When I tried to run on multiple nodes, I received the following error:
I was trying to execute a test from a library I wrote. The code can be found here.
The cluster uses LSF, I ran with the following command:
Output from LSF: