pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
544 stars 281 forks source link

It would be helpful to have more distinction on port range spec env vars in docs. #4371

Open herter4171 opened 4 years ago

herter4171 commented 4 years ago

Hello,

I've just spent a good chunk of time trying to figure out why my processes in a container context were using random, incorrect ports to message back to rank zero in spite of me specifying MPIEXEC_PORT_RANGE=2123:2127. Running this sample code yields the following.

dev@4f89d81f07ae:~/run$ MPIEXEC_PORT_RANGE=2123:2127 mpiexec -n 4 -f ~/.ssh/mf.txt $PWD/my_bcast 2>&1 out.txt
Process 0 broadcasting data 100
Process 1 received data 100 from root process
Fatal error in MPI_Send: Unknown error class, error stack:
MPI_Send(174)..............: MPI_Send(buf=0x7ffd1078baa4, count=1, MPI_INT, dest=2, tag=0, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1845): Communication error with rank 2: Connection refused

What I've ultimately found is that if I, instead, use MPICH_PORT_RANGE=2123:2127, the output is what it should be.

dev@4f89d81f07ae:~/run$ MPICH_PORT_RANGE=2123:2127 mpiexec -n 4 -f ~/.ssh/mf.txt $PWD/my_bcast 2>&1 out.txt
Process 0 broadcasting data 100
Process 1 received data 100 from root process
Process 2 received data 100 from root process
Process 3 received data 100 from root process

I know next to nothing about MPI, but going off of the fact that MPICH_PORT_RANGE succeeds where MPIEXEC_PORT_RANGE fails in this case, it seems incorrect to have the docs say these two environment variables accomplish the same thing.

Feel free to close this whenever. I just wanted to offer some feedback so that, hopefully, nobody else loses time on this distinction.

hzhou commented 3 years ago

Just realized that MPICH_PORT_RANGE is to control "MPICH"'s behavior rather than MPIEXECs behavior. I know this is a bit confusing. There are two different sets of connections. First, the process manager needs to connect to control processes. This connection is needed for coordinating process management. Then once the processes are launched, each MPI processes need to connect to each other independently for MPI communication. While sounds redundant, the separation allows a different focus. the process manager interface layer is focused on reliability while MPICH library is more focused on performance.

The documentation definitely needs update/correction.

hzhou commented 2 years ago

It may be cleaner to let hydra accept an option -portrange low:high, and set appropriate netmod portrange variables. With libfabric, each providers have their own variables. It'll be a mess trying to explain all these to the user.