open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.13k stars 858 forks source link

How to specify service level with UCX when running RoCE v1 #6794

Closed wangshuaizs closed 5 years ago

wangshuaizs commented 5 years ago

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v4.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from a source/distribution tarball

Question:

Hi, I installed Open MPI v4.0.0 with UCX v1.6.0. I have Mellanox ConnectX-3 NIC, which can run TCP and RoCE v1. I want to run a mpi application on two nodes with RoCE v1. I trie to find the solution on how to run RoCE v1 with specified service level, but only solutions based on openib was found. So, I want to ask how to specify service level with UCX when running RoCE v1?

dmitrygx commented 5 years ago

UCX_IB_SL=<service level> environment variable does it For more information, you may run ucx_info tool that can give you information on how to configure UCX and etc

E.g. ucx_info -cf show you UCX configuraion:

...
#
# IB Service Level / RoCEv2 Ethernet Priority.
#
#
# syntax:    unsigned integer
# inherits:  UCX_RC_SL, UCX_IB_SL
#
UCX_RC_VERBS_SL=0

...
wangshuaizs commented 5 years ago

@dmitrygladkov Thank you for your reply.

I tried to add UCX_IB_SL=<service level> into my cmd, and the whole cmd is as follows: mpirun -np 2 -H 10.10.10.4:1,10.10.10.6:1 -bind-to none -map-by slot -mca btl self --mca pml ucx -x LD_LIBRARY_PATH -x PATH -x HOROVOD_MPI_THREADS_DISABLE=1 -x UCX_NET_DEVICES=mlx4_0:2 -x UCX_IB_SL=2 <my application> but no RDMA traffic was found in the specify switch queue.

I have tested with ib_write_bw 10.10.10.6 -d mlx4_0 -i 2 -x 2 -S 2 --report_gbits -D 10 to verify that the mapping is correct. (Yes, I found RDMA traffic in switch queue 2 with this cmd).

I wonder if UCX_IB_SL=<service level> can apply to RoCEv1, instead of InfiniBand or RoCEv2?

dmitrygx commented 5 years ago

@wangshuaizs could you pls check whether it was applied by running:

UCX_IB_SL=2 ucx_info -fc

it should be (UCX_RC_VERBS_SL=2):

#
# IB Service Level / RoCEv2 Ethernet Priority.
#
#
# syntax:    unsigned integer
# inherits:  UCX_RC_SL, UCX_IB_SL
#
UCX_RC_VERBS_SL=2

thank you!

brminich commented 5 years ago

@wangshuaizs, plz set also UCX_IB_GID_INDEX=2 (which will select proper gid index, like you do in your ib_write_bw cmd)

wangshuaizs commented 5 years ago

@brminich @dmitrygladkov thank you. After adding `UCX_IB_GID_INDEX, it worked! So I will close this issue. Thank you again!

jsquyres commented 5 years ago

@brminich @dmitrygladkov Is this worth adding to https://www.open-mpi.org/faq/?

dmitrygx commented 5 years ago

@jsquyres makes sense to me Looks like https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce-ucx-pml q. 46 is applicable for this information

dmitrygx commented 5 years ago

@yosefe wdyt? should I update OMPI FAQ (RoCE over UCX PML - q.46) to mention this fact?