Closed lfmeadow closed 3 years ago
Can you pls try the following:
Run ib_read_bw benchmark on mlx5_0 (both sides) and message size 512k and report the result? should be something like this:
Server: taskset -c 0-23 ib_read_bw -s $((512*1024)) -D 5 -d mlx5_0
Client: taskset -c 0-23 ib_read_bw -s $((512*1024)) -D 5 -d mlx5_0 ed-dlgpu-168c
IMB Pingpong with UCX_LOG_LEVEL=info
env var, and post the output?
IMB Pingpong with UCX_TLS=rc UCX_RNDV_THRESH=1k UCX_MAX_RNDV_RAILS=1 UCX_RNDV_SCHEME=get_zcopy
to see if it improves the result?
Try binding the MPI rank to one specific core (0 instead of 0-23, 24 instead of 24-47) to see if it improves?
Test osu_bw benchmark since it has a window of 64 outstanding send operations
Will do. However, now I'm thinking it may be a firmware issue. Card 0 has a PSID DEL0000000010 and the firmware tools won't let me update the firmware. So our system guy is talking to Dell. I'll keep you informed. Thanks for the quick response.
Upgrading to FW 20.31.1014 made the problem go away. Sorry for the noise.
Describe the bug
We have two 2-socket Dell servers with two AMD EPYC 7402 (24-core) processors each and two Connect-6 cards each. The HSAs are each on a separate PCI bus corresponding to each processor. Each HSA is connected to the same IB switch. I ran an MPI PingPong (from Intel IMB benchmarks) in all 4 combinations of mlx5_0:1 and mlx5_1:1 on the two servers using UCX_NET_DEVICES and binding the MPI rank to the corresponding socket, e.g.:
Here are two bandwidth tables, one for 256KiB and one for 512KiB:
Since the cards are all connected to a switch and the MPI ranks are bound to the closest socket I would expect all the bandwidths to be about the same. It seems like there are two problems:
Perhaps this is some configuration problem with card 0.
Steps to Reproduce
--prefix=/home/larry/sycl-with-cuda/ucx_install --with-cuda=/usr/local/cuda --enable-mt
UCX_NET_DEVICES
as shown in the descriptionSetup and versions
ibstat
oribv_devinfo -vv
commandAdditional information (depending on the issue)
OpenMPI version: 4.1.1
Output of
ucx_info -d
to show transports and devices recognized by UCX: ucx_info.txtConfigure result - config.log: config.log
Log file - configure UCX with "--enable-logging" - and run with "UCX_LOG_LEVEL=data": on request