linux-rdma / perftest

Infiniband Verbs Performance Tests
Other
614 stars 290 forks source link

"Failed to disconnect RDMA CM connection" seen on client while running ib_write_bw #121

Closed smitkothari94 closed 1 year ago

smitkothari94 commented 3 years ago

Hi all,

With latest perftest while running ib_write_bw the below errors are seen on client side Failed to disconnect RDMA CM connection. ERRNO: Connection reset by peer. Failed to disconnect RDMA CM nodes.

No errors observed in dmesg

Current behavior:

Server #ib_write_bw -n 10 -R -s 2048 -D 10 -p 11000

**** * Waiting for client to connect... * **** --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : cxgb4_0 Number of qps : 1 Transport type : IW Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : OFF CQ Moderation : 100 Mtu : 1024[B] Link type : Ethernet GID index : 0 Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm --------------------------------------------------------------------------------------- Waiting for client rdma_cm QP to connect Please run the same command with the IB/RoCE interface IP --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x040a PSN 0x95f56e GID: 00:07:67:60:01:112:00:00:00:00:00:00:00:00:00:00 remote address: LID 0000 QPN 0x040a PSN 0xc476a6 GID: 00:07:67:62:204:144:00:00:00:00:00:00:00:00:00:00 ---------------------------------------------------------------------------------------

bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

2048 6143300 0.00 1999.78 1.023887 ---------------------------------------------------------------------------------------

Client ------

ib_write_bw -n 10 -R -s 2048 -D 10 -p 11000 102.1.1.245

--------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : cxgb4_0 Number of qps : 1 Transport type : IW Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : OFF TX depth : 128 CQ Moderation : 100 Mtu : 1024[B] Link type : Ethernet GID index : 0 Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x040a PSN 0xc476a6 GID: 00:07:67:62:204:144:00:00:00:00:00:00:00:00:00:00 remote address: LID 0000 QPN 0x040a PSN 0x95f56e GID: 00:07:67:60:01:112:00:00:00:00:00:00:00:00:00:00 ---------------------------------------------------------------------------------------

bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

Conflicting CPU frequency values detected: 2636.683000 != 2179.229000. CPU Frequency is not max. 2048 6143300 0.00 1999.78 1.023887 --------------------------------------------------------------------------------------- Failed to disconnect RDMA CM connection. ERRNO: Connection reset by peer. Failed to disconnect RDMA CM nodes.

Expected behavior Error messages on the client should not be seen

Observation: 1) The issue is seen intermittently 2) Issue did not hit with perftest-4.2-0.8

Context: OS: RHEL 7.9 (3.10.0-1160.el7.x86_64) perftest version : tot(7504ce48)

Thanks Smit Kothari

talatb commented 3 years ago

This is a known issue for Chelsio driver see https://service.chelsio.com/beta/drivers/ChelsioUwire-3.14.0.2/Release%20Notes.txt