Closed Muzizhenzhen closed 2 months ago
We have modified the code to set attr.ah_attr.is_global=1
by default to avoid the problem of failing to establish connections in some network environments.
You should also focus on the "infiniband"
subsection of config/smart_config.json
, including "name", "port" and "gid_idx" (recently added).
We have modified the code to set
attr.ah_attr.is_global=1
by default to avoid the problem of failing to establish connections in some network environments. You should also focus on the"infiniband"
subsection ofconfig/smart_config.json
, including "name", "port" and "gid_idx" (recently added).
Thanks for your reply, I just saw it.
I have one more question. I tried the ofed_info5.3 version today, using a cx-5 100G network card, using the "read" operation. When the qp was 96, I did not see any significant change in IOPS when changing the owr number.
Apart from that, I tried a modification based on the rdma-bench code (ATC'16), measured separately on 100G network cards of cx5 and cx6, only changing the OWR number (polling 1 CQ every n WQE) did not Obvious IOPS changes
I used a total of two machines to run test/test_rdma. The experimental environment of each machine was 32-core AMD 7452 at 2.35GHz, 128GB ECC Memory, 100G CX-5 RDMA network card, 100Gbs Dell Z9264F-ON switch, Ubuntu 20.04 (Linux Kernel 5.4). Run the command sudo LD_PRELOAD=libmlx5.so ./test/test_rdma on the server side, and run the command sudo LD_PRELOAD=libmlx5.so ./test/test_rdma on the client side \ [nr_thread] \ [outstanding_work_request_per_thread]. Only nr_thread and outstanding_work_request_per_thread are changed, and the rest are unchanged. .
Is there any configuration I missed? Or will only the 200G cx6 network card experience a significant decrease in throughput? Have you tried testing on a 100G cx6 network card?
Please guide me, thank you!
If you run test_rdma
manually, all optimizations of SMART, including work request throttling, are enabled by default. You can checkout smart_config.json
.
In addition, as the test platform has only 32 cores, performance impacts may not be significant.
I changed the host name to the host name I use and want to try to allow test/test_rdma. No other parameters were modified, but an ibv_modify_qp parameter error was encountered, and the code there was located. // INIT -> RTR flags = IBV_QP_STATE | IBV_QP_PATH_MTU; memset(&attr, 0, sizeof(attr)); attr.qp_state = IBV_QPS_RTR; attr.path_mtu = IBV_MTU_512; if (qp->qp->qp_type == IBV_QPT_RC) { flags |= IBV_QP_AV | IBV_QP_DEST_QPN | IBV_QP_RQ_PSN | IBV_QP_MAX_DEST_RD_ATOMIC | IBV_QP_MIN_RNR_TIMER; attr.ah_attr.is_global = 0; attr.ah_attr.dlid = lid; attr.ah_attr.port_num = ibport; attr.dest_qp_num = qp_num; attr.rq_psn = psn; // match remote sq_psn attr.max_dest_rd_atomic = 16; attr.min_rnr_timer = 12; } if (ibv_modify_qp(qp->qp, &attr, flags)) { printf("The third ibv_modify_qp\n"); SDS_PERROR("ibv_modify_qp"); return -1; } Error message: smart/smart/resource_manager.cpp:623: enable_queue_pair(): ibv_modify_qp: Invalid argument,
How to modify it?