I installed openmpi 4.0.7 and 4.1.5 on RHEL 9. When I ran OSU Benchmarks, I received the following error messages. Can these error messages be ignored? If yes, how to remove the messages?
[node1804:235828] common_ucx.c:404 waiting for 1 disconnect requests
[node1804:235828] common_ucx.h:153 ucp_disconnect_nb failed: 1, Operation in progress
[node1804:235828] common_ucx.c:440 disconnecting from rank 7
[1696387831.419127] [node1804:235828:0] flush.c:56 UCX ERROR req 0x24b9300: error during flush: Connection reset by remote peer
[1696387831.419133] [node1804:235828:0] flush.c:56 UCX ERROR req 0x24b9300: error during flush: Connection reset by remote peer
[node:1804235828] common_ucx.c:444 Error: ucp_disconnect_nb(7) failed: Connection reset by remote peer
[node1804:235828] common_ucx.c:404 waiting for 0 disconnect requests
The performance of OSU benchmark is worse on RHEL9 than our current RHEL7. Openmpi/4.0.7 On RHEL7 doesn't use UCX while openmpi on RHEL9 does. I am not sure if OSU benchmark is good to measure UCX performance or not. If it does, is there a way to improve openmpi performance on RHEL9?
I installed openmpi 4.0.7 and 4.1.5 on RHEL 9. When I ran OSU Benchmarks, I received the following error messages. Can these error messages be ignored? If yes, how to remove the messages?
The performance of OSU benchmark is worse on RHEL9 than our current RHEL7. Openmpi/4.0.7 On RHEL7 doesn't use UCX while openmpi on RHEL9 does. I am not sure if OSU benchmark is good to measure UCX performance or not. If it does, is there a way to improve openmpi performance on RHEL9?
Below is the openmpi/4.0.7 performance on RHEL9:
Below is the openmpi/4.1.5 performance on RHEL9: