Closed wangshuaizs closed 5 years ago
UCX_IB_SL=<service level>
environment variable does it
For more information, you may run ucx_info
tool that can give you information on how to configure UCX and etc
E.g. ucx_info -cf
show you UCX configuraion:
...
#
# IB Service Level / RoCEv2 Ethernet Priority.
#
#
# syntax: unsigned integer
# inherits: UCX_RC_SL, UCX_IB_SL
#
UCX_RC_VERBS_SL=0
...
@dmitrygladkov Thank you for your reply.
I tried to add UCX_IB_SL=<service level>
into my cmd, and the whole cmd is as follows:
mpirun -np 2 -H 10.10.10.4:1,10.10.10.6:1 -bind-to none -map-by slot -mca btl self --mca pml ucx -x LD_LIBRARY_PATH -x PATH -x HOROVOD_MPI_THREADS_DISABLE=1 -x UCX_NET_DEVICES=mlx4_0:2 -x UCX_IB_SL=2 <my application>
but no RDMA traffic was found in the specify switch queue.
I have tested with ib_write_bw 10.10.10.6 -d mlx4_0 -i 2 -x 2 -S 2 --report_gbits -D 10
to verify that the mapping is correct. (Yes, I found RDMA traffic in switch queue 2 with this cmd).
I wonder if UCX_IB_SL=<service level>
can apply to RoCEv1, instead of InfiniBand or RoCEv2?
@wangshuaizs could you pls check whether it was applied by running:
UCX_IB_SL=2 ucx_info -fc
it should be (UCX_RC_VERBS_SL=2
):
#
# IB Service Level / RoCEv2 Ethernet Priority.
#
#
# syntax: unsigned integer
# inherits: UCX_RC_SL, UCX_IB_SL
#
UCX_RC_VERBS_SL=2
thank you!
@wangshuaizs, plz set also UCX_IB_GID_INDEX=2 (which will select proper gid index, like you do in your ib_write_bw cmd)
@brminich @dmitrygladkov thank you. After adding `UCX_IB_GID_INDEX
, it worked! So I will close this issue. Thank you again!
@brminich @dmitrygladkov Is this worth adding to https://www.open-mpi.org/faq/?
@jsquyres makes sense to me Looks like https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce-ucx-pml q. 46 is applicable for this information
@yosefe wdyt? should I update OMPI FAQ (RoCE over UCX PML - q.46) to mention this fact?
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v4.0.0
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from a source/distribution tarball
Question:
Hi, I installed Open MPI v4.0.0 with UCX v1.6.0. I have Mellanox ConnectX-3 NIC, which can run TCP and RoCE v1. I want to run a mpi application on two nodes with RoCE v1. I trie to find the solution on how to run RoCE v1 with specified service level, but only solutions based on
openib
was found. So, I want to ask how to specify service level with UCX when running RoCE v1?