Open 0xymoro opened 5 years ago
Hi, quick issue with mpiexec. Without it the program runs fine with 1 gpu (am running Horovod within a Docker container), but mpiexec hangs whenever it's invoked.
I ran a strace and it hangs after this sequence of creating pads; any hints would be appreciated!
write(1, "Creating pad 1_1_6_6\n", 21) = 21 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=23, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=0, events=POLLIN}, {fd=32, events=POLLIN}, {fd=24, events=POLLIN}], 9, -1) = 1 ([{fd=24, revents=POLLIN}]) read(24, "Creating pad 1_1_4_4\n", 4096) = 21 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=23, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=0, events=POLLIN}, {fd=32, events=POLLIN}, {fd=24, events=POLLIN}], 9, 0) = 0 (Timeout) write(1, "Creating pad 1_1_4_4\n", 21) = 21 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=23, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=0, events=POLLIN}, {fd=32, events=POLLIN}, {fd=24, events=POLLIN}], 9, -1
Have you solved this issue?
Hi, quick issue with mpiexec. Without it the program runs fine with 1 gpu (am running Horovod within a Docker container), but mpiexec hangs whenever it's invoked.
I ran a strace and it hangs after this sequence of creating pads; any hints would be appreciated!
write(1, "Creating pad 1_1_6_6\n", 21) = 21 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=23, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=0, events=POLLIN}, {fd=32, events=POLLIN}, {fd=24, events=POLLIN}], 9, -1) = 1 ([{fd=24, revents=POLLIN}]) read(24, "Creating pad 1_1_4_4\n", 4096) = 21 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=23, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=0, events=POLLIN}, {fd=32, events=POLLIN}, {fd=24, events=POLLIN}], 9, 0) = 0 (Timeout) write(1, "Creating pad 1_1_4_4\n", 21) = 21 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=23, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=0, events=POLLIN}, {fd=32, events=POLLIN}, {fd=24, events=POLLIN}], 9, -1