open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.12k stars 856 forks source link

Openmpi conflicting with wifi on macOS #6719

Open nbengdahl opened 5 years ago

nbengdahl commented 5 years ago

I'm getting strange behaviors on MacOS 10.14.5 running OMPI v4.0.0 (MPI was compiled from the source distribution tarball) and also on earlier versions like v3.0.0. The error included below is produced when I try to run an mpi program when my system is connected to certain wifi networks but not others. If I turn off wifi and run the exact same code, mpi works fine. The issue seems to be specific to macOS and was first observed on macOS High Sierra but is persisting. The issue has be reproduced by several other people running the same code on macOS.

Mpi is being used here as part of a nonlinear simulation platform for hydrologic modeling called ParFlow so I don't have a simple program I can include to demonstrate the error. Any thoughts on resolving this would be greatly appreciated, thanks!

[guest-wl-dhcp-154-11:12131] *** An error occurred in MPI_Comm_split
[guest-wl-dhcp-154-11:12131] *** reported by process [1866268673,0]
[guest-wl-dhcp-154-11:12131] *** on communicator MPI_COMM_WORLD
[guest-wl-dhcp-154-11:12131] *** MPI_ERR_ARG: invalid argument of some other kind
[guest-wl-dhcp-154-11:12131] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[guest-wl-dhcp-154-11:12131] ***    and potentially your MPI job)
jsquyres commented 4 years ago

I'm sorry we missed this issue for so long.

We just released Open MPI v4.0.2 -- can you see if this is still an issue with the new version?

nbengdahl commented 4 years ago

Apologies for the delay, I just tried using v4.0.2 and the same error occurs. If it helps, the error seems to only happen on enterprise WiFi networks. Please let me know what I can provide to help fix this bug. Thanks!