linux-rdma / perftest

Infiniband Verbs Performance Tests
Other
534 stars 274 forks source link

Test "ib_write_bw" is failing with "Completion with error at client" at client and "ethernet_read_keys: Couldn't read remote address" at server #262

Closed manomugdha closed 1 month ago

manomugdha commented 1 month ago

Hi, I am running this test between two different physical linux box having Ubuntu 22.04.

Server side command:

./ib_write_bw

Client side command:

./ib_write_bw 5.5.5.1

Console output at server side:

libibverbs: Warning: couldn't load driver 'libmana-rdmav34.so': libmana-rdmav34.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'liberdma-rdmav34.so': liberdma-rdmav34.so: cannot open shared object file: No such file or directory

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : rocep4s0f0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 CQ Moderation   : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x01 QPN 0x0057 PSN 0xfdc71d RKey 0xc525123c VAddr 0x0070150eac1000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:05:05:05:01
 remote address: LID 0x01 QPN 0x004b PSN 0xb6f464 RKey 0x985a2e8e VAddr 0x007b647f96b000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:05:05:05:02
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdma_cm
 Failed to exchange data between server and clients

Console output at client side:

libibverbs: Warning: couldn't load driver 'libmana-rdmav34.so': libmana-rdmav34.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'liberdma-rdmav34.so': liberdma-rdmav34.so: cannot open shared object file: No such file or directory
establish_connection
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : rocep2s0f0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x01 QPN 0x004b PSN 0xb6f464 RKey 0x985a2e8e VAddr 0x007b647f96b000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:05:05:05:02
 remote address: LID 0x01 QPN 0x0057 PSN 0xfdc71d RKey 0xc525123c VAddr 0x0070150eac1000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:05:05:05:01
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 Completion with error at client
 Failed status 12: wr_id 0 syndrom 0x10008
scnt=128, ccnt=0
 Failed to complete run_iter_bw function successfully

Device info at server:

 ibv_devinfo:
libibverbs: Warning: couldn't load driver 'libmana-rdmav34.so': libmana-rdmav34.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'liberdma-rdmav34.so': liberdma-rdmav34.so: cannot open shared object file: No such file or directory
hca_id: rocep4s0f0
        transport:                      InfiniBand (0)
        fw_ver:                         1.60
        node_guid:                      b696:91ff:feb4:95c8
        sys_image_guid:                 b696:91ff:feb4:95c8
        vendor_id:                      0x8086
        vendor_part_id:                 5522
        hw_ver:                         0x2
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             Ethernet

Device info at client:

ibv_devinfo:
libibverbs: Warning: couldn't load driver 'libmana-rdmav34.so': libmana-rdmav34.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'liberdma-rdmav34.so': liberdma-rdmav34.so: cannot open shared object file: No such file or directory
hca_id: rocep2s0f0
        transport:                      InfiniBand (0)
        fw_ver:                         1.60
        node_guid:                      b696:91ff:feb4:9660
        sys_image_guid:                 b696:91ff:feb4:9660
        vendor_id:                      0x8086
        vendor_part_id:                 5522
        hw_ver:                         0x2
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             Ethernet

I am assuming that the following warnings are not fatal:

libibverbs: Warning: couldn't load driver 'libmana-rdmav34.so': libmana-rdmav34.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'liberdma-rdmav34.so': liberdma-rdmav34.so: cannot open shared object file: No such file or directory

Please note that:

  1. this test runs successfully in loopback mode
  2. test ib_write_lat runs successfully
  3. I am using Intel nic

can you please share info about why this BW test is not running between two different physical linux boxes?

manomugdha commented 1 month ago

after installing following libs, i can run the test successfully.

Install RDMA user space tools and libraries required for Ubuntu:
apt-get install -f libtool ibutils ibverbs-utils
rdmacm-utils infiniband-diags perftest librdmacm-dev
libibverbs-dev numactl libnuma-dev libnl-3-200
libnl-route-3-200 libnl-route-3-dev libnl-utils

https://support.hpe.com/hpesc/public/docDisplay?docId=a00071081en_us&docLocale=en_US&page=GUID-617F4C95-AA58-43F7-B524-78C6535747AC.html