openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.17k stars 428 forks source link

[Question] is ibv_create_srq mandatory for ucx? #9736

Closed Honggang-LI closed 5 days ago

Honggang-LI commented 8 months ago

We have RoCE hardware which does not support SRQ, in other words it does not support ibv_create_srq.

Is ibv_create_srq mandatory for ucx? It seems ucx is really unhappy and almost all ucx tests are failed.

thanks

yosefe commented 8 months ago

Currently SRQ support required for RC transport but not for UD transport. Can you try forcing UCX_TLS=ud?

Honggang-LI commented 8 months ago
[root@m201 ~]#  UCX_LOG_LEVEL=TRACE UCX_TLS=ud UCX_NET_DEVICES=mlx5_1:1 timeout 5m ucx_perftest -t tag_bw
[1710151836.487419] [m201:4904 :0]           debug.c:1146 UCX  DEBUG using signal stack 0x7fb3e588a000 size 141824
[1710151836.500448] [m201:4904 :0]            init.c:118  UCX  DEBUG /lib64/libucs.so.0 loaded at 0x7fb3e5a18000
[1710151836.500487] [m201:4904 :0]            init.c:120  UCX  DEBUG cmd line: ucx_perftest -t tag_bw 
[1710151836.500501] [m201:4904 :0]          module.c:72   UCX  DEBUG ucs library path: /lib64/libucs.so.0
[1710151836.500507] [m201:4904 :0]          module.c:280  UCX  DEBUG loading modules for ucs
[1710151836.500544] [m201:4904 :0]          module.c:280  UCX  DEBUG loading modules for ucx_perftest
[1710151836.500574] [m201:4904 :0]        perftest.c:921  UCX  WARN  CPU affinity is not set (bound to 8 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.2.0.51:56880
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         tag match bandwidth                                                                        |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 8                                                                                          |
+----------------------------------------------------------------------------------------------------------+
[1710151844.422088] [m201:4904 :0]         libperf.c:213  UCX  DEBUG set send allocator by send mem type host
[1710151844.422093] [m201:4904 :0]         libperf.c:217  UCX  DEBUG set recv allocator by recv mem type host
[1710151844.422104] [m201:4904 :0]            time.c:22   UCX  DEBUG arch clock frequency: 3500000000.00 Hz
[1710151844.422156] [m201:4904 :0]     ucp_context.c:1969 UCX  INFO  Version 1.14.1 (loaded from /lib64/libucp.so.0)
[1710151844.422165] [m201:4904 :0]     ucp_context.c:1739 UCX  DEBUG estimated number of endpoints is 1
[1710151844.422168] [m201:4904 :0]     ucp_context.c:1746 UCX  DEBUG estimated number of endpoints per node is 1
[1710151844.422173] [m201:4904 :0]     ucp_context.c:1753 UCX  DEBUG estimated bcopy bandwidth is 6081740800.000000
[1710151844.422180] [m201:4904 :0]     ucp_context.c:1813 UCX  DEBUG allocation method[0] is md 'sysv'
[1710151844.422184] [m201:4904 :0]     ucp_context.c:1813 UCX  DEBUG allocation method[1] is md 'posix'
[1710151844.422190] [m201:4904 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[2] is 'huge'
[1710151844.422193] [m201:4904 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[3] is 'thp'
[1710151844.422196] [m201:4904 :0]     ucp_context.c:1813 UCX  DEBUG allocation method[4] is md '*'
[1710151844.422199] [m201:4904 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[5] is 'mmap'
[1710151844.422201] [m201:4904 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[6] is 'heap'
[1710151844.422223] [m201:4904 :0]          module.c:280  UCX  DEBUG loading modules for uct
[1710151844.423446] [m201:4904 :0]          module.c:280  UCX  DEBUG loading modules for uct_ib
[1710151844.423626] [m201:4904 :0]     ucp_context.c:1445 UCX  DEBUG closing md self because it has no selected transport resources
[1710151844.423999] [m201:4904 :0]       uct_iface.c:905  UCX  DEBUG enp8s0: PF sysfs path is '/sys/devices/pci0000:00/0000:00:1c.4/0000:08:00.0'
[1710151844.424008] [m201:4904 :0]            topo.c:176  UCX  DEBUG added sys_dev 0 for bus id 08:00.0
[1710151844.424024] [m201:4904 :0]            topo.c:359  UCX  DEBUG enp8s0: bdf_name 0000:08:00.0 sys_dev 0
[1710151844.424873] [m201:4904 :0]       uct_iface.c:905  UCX  DEBUG mlx5_roce_p2: PF sysfs path is '/sys/devices/pci0000:00/0000:00:02.2/0000:05:00.1'
[1710151844.424881] [m201:4904 :0]            topo.c:176  UCX  DEBUG added sys_dev 1 for bus id 05:00.1
[1710151844.424897] [m201:4904 :0]            topo.c:359  UCX  DEBUG mlx5_roce_p2: bdf_name 0000:05:00.1 sys_dev 1
[1710151844.425204] [m201:4904 :0]       uct_iface.c:905  UCX  DEBUG enp9s0: PF sysfs path is '/sys/devices/pci0000:00/0000:00:1c.5/0000:09:00.0'
[1710151844.425211] [m201:4904 :0]            topo.c:176  UCX  DEBUG added sys_dev 2 for bus id 09:00.0
[1710151844.425215] [m201:4904 :0]            topo.c:359  UCX  DEBUG enp9s0: bdf_name 0000:09:00.0 sys_dev 2
[1710151844.425509] [m201:4904 :0]       uct_iface.c:901  UCX  DEBUG lo: sysfs path undetected
[1710151844.425515] [m201:4904 :0]            topo.c:363  UCX  DEBUG lo: system device unknown
[1710151844.425548] [m201:4904 :0]     ucp_context.c:1445 UCX  DEBUG closing md tcp because it has no selected transport resources
[1710151844.425575] [m201:4904 :0]     ucp_context.c:1445 UCX  DEBUG closing md sysv because it has no selected transport resources
[1710151844.425637] [m201:4904 :0]     ucp_context.c:1445 UCX  DEBUG closing md posix because it has no selected transport resources
[1710151844.440124] [m201:4904 :0]       uct_iface.c:905  UCX  DEBUG mlx5_0: PF sysfs path is '/sys/devices/pci0000:00/0000:00:02.2/0000:05:00.0'
[1710151844.440135] [m201:4904 :0]            topo.c:176  UCX  DEBUG added sys_dev 3 for bus id 05:00.0
[1710151844.440152] [m201:4904 :0]            topo.c:359  UCX  DEBUG mlx5_0: bdf_name 0000:05:00.0 sys_dev 3
[1710151844.440175] [m201:4904 :0]       ib_device.c:495  UCX  DEBUG mlx5_0: vendor_id 0x15b3 device_id 4117
[1710151844.440518] [m201:4904 :0]    ib_mlx5dv_md.c:916  UCX  DEBUG mlx5dv_devx_general_cmd(QUERY_HCA_CAP) failed: Remote I/O error
[1710151844.440524] [m201:4904 :0]    ib_mlx5dv_md.c:1138 UCX  DEBUG mlx5_0: crossing_vhca_mkey is not supported
[1710151844.440528] [m201:4904 :0]    ib_mlx5dv_md.c:728  UCX  DEBUG mlx5_0: disable ODP because it's not supported for DEVX QP
[1710151844.440737] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad548d90 [id=5 ref 1] ???() to hash
[1710151844.440807] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1710151844.440813] [m201:4904 :0]       ib_device.c:593  UCX  DEBUG initialized device 'mlx5_0' (InfiniBand channel adapter) with 1 ports
[1710151844.440824] [m201:4904 :0]           ib_md.c:1788 UCX  DEBUG mlx5_0: cuda GPUDirect RDMA is disabled
[1710151844.440831] [m201:4904 :0]           ib_md.c:1788 UCX  DEBUG mlx5_0: rocm GPUDirect RDMA is disabled
[1710151844.440842] [m201:4904 :0]           ib_md.c:1815 UCX  DEBUG mlx5_0: dmabuf is supported
[1710151844.440848] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1710151844.440880] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad548170 [id=9 ref 1] ???() to hash
[1710151844.440889] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 9 events 0x1 mode thread_spinlock
[1710151844.441020] [m201:4904 :0]          module.c:280  UCX  DEBUG loading modules for ucm
[1710151844.441027] [m201:4904 :0]           ib_md.c:1500 UCX  DEBUG mlx5_0: using registration cache
[1710151844.441031] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1710151844.441317] [m201:4904 :0]    ib_mlx5dv_md.c:1219 UCX  DEBUG mlx5_0: opened DEVX md
[1710151844.441500] [m201:4904 :0]    ib_mlx5dv_md.c:884  UCX  DEBUG created indirect rkey 0x17fe00 for remote flush
[1710151844.441507] [m201:4904 :0]           ib_md.c:1727 UCX  DEBUG mlx5_0: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1710151844.441536] [m201:4904 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0xc4
[1710151844.441542] [m201:4904 :0]          uct_md.c:129  UCX  DEBUG failed to query dc_mlx5 resources: No such device
[1710151844.442563] [m201:4904 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0x0
[1710151844.442570] [m201:4904 :0]          uct_md.c:129  UCX  DEBUG failed to query rc_verbs resources: No such device
[1710151844.442576] [m201:4904 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0x4
[1710151844.442592] [m201:4904 :0]          uct_md.c:129  UCX  DEBUG failed to query rc_mlx5 resources: No such device
[1710151844.442598] [m201:4904 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0x0
[1710151844.442602] [m201:4904 :0]          uct_md.c:129  UCX  DEBUG failed to query ud_verbs resources: No such device
[1710151844.442608] [m201:4904 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0x4
[1710151844.442613] [m201:4904 :0]          uct_md.c:129  UCX  DEBUG failed to query ud_mlx5 resources: No such device
[1710151844.442619] [m201:4904 :0]     ucp_context.c:1003 UCX  DEBUG No tl resources found for md mlx5_0
[1710151844.442624] [m201:4904 :0]     ucp_context.c:1445 UCX  DEBUG closing md mlx5_0 because it has no selected transport resources
[1710151844.443026] [m201:4904 :0]           mpool.c:194  UCX  DEBUG mpool devx dbrec destroyed
[1710151844.443040] [m201:4904 :0]           async.c:156  UCX  DEBUG removed async handler 0x55c3ad548170 [id=9 ref 1] ???() from hash
[1710151844.443046] [m201:4904 :0]           async.c:546  UCX  DEBUG removing async handler 0x55c3ad548170 [id=9 ref 1] ???()
[1710151844.443054] [m201:4904 :0]           async.c:171  UCX  DEBUG release async handler 0x55c3ad548170 [id=9 ref 0] ???()
[1710151844.443064] [m201:4904 :0]         pgtable.c:618  UCX  DEBUG purge empty page table
[1710151844.443069] [m201:4904 :0]           mpool.c:194  UCX  DEBUG mpool rcache_mp destroyed
[1710151844.443075] [m201:4904 :0]       ib_device.c:614  UCX  DEBUG destroying ib device mlx5_0
[1710151844.443081] [m201:4904 :0]           async.c:156  UCX  DEBUG removed async handler 0x55c3ad548d90 [id=5 ref 1] ???() from hash
[1710151844.443086] [m201:4904 :0]           async.c:546  UCX  DEBUG removing async handler 0x55c3ad548d90 [id=5 ref 1] ???()
[1710151844.443134] [m201:4904 :0]           async.c:171  UCX  DEBUG release async handler 0x55c3ad548d90 [id=5 ref 0] ???()
[1710151844.445716] [m201:4904 :0]       uct_iface.c:905  UCX  DEBUG mlx5_1: PF sysfs path is '/sys/devices/pci0000:00/0000:00:02.2/0000:05:00.1'
[1710151844.445727] [m201:4904 :0]            topo.c:359  UCX  DEBUG mlx5_1: bdf_name 0000:05:00.1 sys_dev 1
[1710151844.445764] [m201:4904 :0]       ib_device.c:495  UCX  DEBUG mlx5_1: vendor_id 0x15b3 device_id 4117
[1710151844.446131] [m201:4904 :0]    ib_mlx5dv_md.c:916  UCX  DEBUG mlx5dv_devx_general_cmd(QUERY_HCA_CAP) failed: Remote I/O error
[1710151844.446137] [m201:4904 :0]    ib_mlx5dv_md.c:1138 UCX  DEBUG mlx5_1: crossing_vhca_mkey is not supported
[1710151844.446143] [m201:4904 :0]    ib_mlx5dv_md.c:728  UCX  DEBUG mlx5_1: disable ODP because it's not supported for DEVX QP
[1710151844.446374] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad54b170 [id=5 ref 1] ???() to hash
[1710151844.446433] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1710151844.446439] [m201:4904 :0]       ib_device.c:593  UCX  DEBUG initialized device 'mlx5_1' (InfiniBand channel adapter) with 1 ports
[1710151844.446451] [m201:4904 :0]           ib_md.c:1788 UCX  DEBUG mlx5_1: cuda GPUDirect RDMA is disabled
[1710151844.446460] [m201:4904 :0]           ib_md.c:1788 UCX  DEBUG mlx5_1: rocm GPUDirect RDMA is disabled
[1710151844.446471] [m201:4904 :0]           ib_md.c:1815 UCX  DEBUG mlx5_1: dmabuf is supported
[1710151844.446478] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1710151844.446491] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad54b830 [id=9 ref 1] ???() to hash
[1710151844.446500] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 9 events 0x1 mode thread_spinlock
[1710151844.446598] [m201:4904 :0]           ib_md.c:1500 UCX  DEBUG mlx5_1: using registration cache
[1710151844.446605] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1710151844.446872] [m201:4904 :0]    ib_mlx5dv_md.c:1219 UCX  DEBUG mlx5_1: opened DEVX md
[1710151844.447046] [m201:4904 :0]    ib_mlx5dv_md.c:884  UCX  DEBUG created indirect rkey 0x1fee00 for remote flush
[1710151844.447051] [m201:4904 :0]           ib_md.c:1727 UCX  DEBUG mlx5_1: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1710151844.447076] [m201:4904 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0xc4
[1710151844.447081] [m201:4904 :0]          uct_md.c:129  UCX  DEBUG failed to query dc_mlx5 resources: No such device
[1710151844.448070] [m201:4904 :0]     ucp_context.c:1445 UCX  DEBUG closing md cma because it has no selected transport resources
[1710151844.448081] [m201:4904 :0]     ucp_context.c:1478 UCX  DEBUG register host memory on: mlx5_1
[1710151844.448097] [m201:4904 :0]     ucp_context.c:1478 UCX  DEBUG register cuda memory on: 
[1710151844.448103] [m201:4904 :0]     ucp_context.c:1478 UCX  DEBUG register cuda-managed memory on: 
[1710151844.448108] [m201:4904 :0]     ucp_context.c:1478 UCX  DEBUG register rocm memory on: 
[1710151844.448113] [m201:4904 :0]     ucp_context.c:1478 UCX  DEBUG register rocm-managed memory on: 
[1710151844.448126] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1710151844.448212] [m201:4904 :0]     ucp_context.c:2038 UCX  DEBUG created ucp context 0x55c3ad541000 0x55c3ad541000 [1 mds 2 tls] features 0x1 tl bitmap 0x3 0x0
[1710151844.448234] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448240] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448282] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448300] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448307] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448354] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448361] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448366] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448383] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448390] [m201:4904 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.448615] [m201:4904 :0]        ib_iface.c:869  UCX  DEBUG using pkey[0] 0xffff on mlx5_1:1/RoCE
[1710151844.448696] [m201:4904 :0]       ib_device.c:924  UCX  DEBUG mlx5_1:1 using gid_index 3
[1710151844.449351] [m201:4904 :0]        ib_iface.c:1397 UCX  DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 1064
[1710151844.449686] [m201:4904 :0]        ib_iface.c:987  UCX  DEBUG iface=0x55c3ad59d5f0: created UD QP 0x922 on mlx5_1:1 TX wr:341 sge:6 inl:124 resp:0 RX wr:4096 sge:1 resp:0
[1710151844.450006] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 1116
[1710151844.450110] [m201:4904 :0]           mpool.c:281  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7fb3e4b09018 of 151528 bytes with 131 elements
[1710151844.450120] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 1136
[1710151844.450146] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad59d5f0: adding gid fe80::bace:f6ff:fe72:4be9 to hash on device mlx5_1 port 1 index 0)
[1710151844.450168] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad59d5f0: adding gid fe80::bace:f6ff:fe72:4be9 to hash on device mlx5_1 port 1 index 1)
[1710151844.450189] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad59d5f0: adding gid ::ffff:10.2.0.201 to hash on device mlx5_1 port 1 index 2)
[1710151844.450209] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad59d5f0: adding gid ::ffff:10.2.0.201 to hash on device mlx5_1 port 1 index 3)
[1710151844.450241] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad59d5f0: adding gid fe80::6531:5c5f:9f1c:8cf0 to hash on device mlx5_1 port 1 index 4)
[1710151844.450264] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad59d5f0: adding gid fe80::6531:5c5f:9f1c:8cf0 to hash on device mlx5_1 port 1 index 5)
[1710151844.452833] [m201:4904 :0]     timer_wheel.c:40   UCX  DEBUG high res timer created log=24 resolution=4793.490286 usec wanted: 2500.000000 usec
[1710151844.452847] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad54efd0 [id=11 ref 1] ???() to hash
[1710151844.452870] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 11 events 0x5 mode thread_spinlock
[1710151844.452888] [m201:4904 :0]      ucp_worker.c:1328 UCX  DEBUG created interface[0]=0x55c3ad59d5f0 using ud_verbs/mlx5_1:1 on worker 0x55c3ad5a2480
[1710151844.455426] [m201:4904 :0]        ib_iface.c:869  UCX  DEBUG using pkey[0] 0xffff on mlx5_1:1/RoCE
[1710151844.455512] [m201:4904 :0]       ib_device.c:924  UCX  DEBUG mlx5_1:1 using gid_index 3
[1710151844.455540] [m201:4904 :0]       ib_device.c:1248 UCX  DEBUG max IB CQE size is 128
[1710151844.456222] [m201:4904 :0]        ib_iface.c:1397 UCX  DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 1064
[1710151844.456523] [m201:4904 :0]        ib_iface.c:987  UCX  DEBUG iface=0x55c3ad5c0030: created UD QP 0x924 on mlx5_1:1 TX wr:256 sge:2 inl:60 resp:0 RX wr:4096 sge:1 resp:0
[1710151844.456532] [m201:4904 :0]         ib_mlx5.c:706  UCX  DEBUG tx wq 32768 bytes [bb=64, nwqe=512] mmio_mode bf_post
[1710151844.456813] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 1116
[1710151844.456919] [m201:4904 :0]           mpool.c:281  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7fb3e4a62018 of 151528 bytes with 131 elements
[1710151844.456930] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 1136
[1710151844.456954] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad5c0030: adding gid fe80::bace:f6ff:fe72:4be9 to hash on device mlx5_1 port 1 index 0)
[1710151844.456976] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad5c0030: adding gid fe80::bace:f6ff:fe72:4be9 to hash on device mlx5_1 port 1 index 1)
[1710151844.456997] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad5c0030: adding gid ::ffff:10.2.0.201 to hash on device mlx5_1 port 1 index 2)
[1710151844.457013] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad5c0030: adding gid ::ffff:10.2.0.201 to hash on device mlx5_1 port 1 index 3)
[1710151844.457029] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad5c0030: adding gid fe80::6531:5c5f:9f1c:8cf0 to hash on device mlx5_1 port 1 index 4)
[1710151844.457043] [m201:4904 :0]        ud_iface.c:380  UCX  DEBUG iface 0x55c3ad5c0030: adding gid fe80::6531:5c5f:9f1c:8cf0 to hash on device mlx5_1 port 1 index 5)
[1710151844.459392] [m201:4904 :0]         ib_mlx5.c:1028 UCX  DEBUG SL=0 (AR support - no) was selected on mlx5_1:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1710151844.459409] [m201:4904 :0]     timer_wheel.c:40   UCX  DEBUG high res timer created log=24 resolution=4793.490286 usec wanted: 2500.000000 usec
[1710151844.459421] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad598d20 [id=12 ref 1] ???() to hash
[1710151844.459429] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 12 events 0x5 mode thread_spinlock
[1710151844.459440] [m201:4904 :0]      ucp_worker.c:1328 UCX  DEBUG created interface[1]=0x55c3ad5c0030 using ud_mlx5/mlx5_1:1 on worker 0x55c3ad5a2480
[1710151844.459445] [m201:4904 :0]      ucp_worker.c:1131 UCX  DEBUG selected scalable tl bitmap: 0x3 0x0 (2 tls)
[1710151844.459644] [m201:4904 :0]           async.c:231  UCX  DEBUG added async handler 0x55c3ad5f0fa0 [id=13 ref 1] ???() to hash
[1710151844.459655] [m201:4904 :0]           async.c:493  UCX  DEBUG listening to async event fd 13 events 0x1 mode thread_spinlock
[1710151844.459659] [m201:4904 :0]       rdmacm_cm.c:974  UCX  DEBUG created rdmacm_cm 0x55c3ad5f08f0 with event_channel 0x55c3ad5515d0 (fd=13)
[1710151844.459669] [m201:4904 :0]      tcp_sockcm.c:223  UCX  DEBUG created tcp_sockcm 0x55c3ad60f7f0
[1710151844.459675] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ucp_requests: align 64, maxelems 4294967295, elemsize 280
[1710151844.459679] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ucp_rkeys: align 64, maxelems 4294967295, elemsize 104
[1710151844.459682] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ucp_reg_bufs: align 64, maxelems 4294967295, elemsize 8216
[1710151844.459687] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 153
[1710151844.459690] [m201:4904 :0]           mpool.c:136  UCX  DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 1105
[1710151844.459694] [m201:4904 :0]       mpool_set.c:129  UCX  DEBUG mpool_set:ucp_am_bufs, sizes map 0x80000040, largest size 1016, mpools num 2
[1710151844.459730] [m201:4904 :0]          parser.c:1998 UCX  INFO  UCX_* env variables: UCX_TLS=ud UCX_NET_DEVICES=mlx5_1:1 UCX_LOG_LEVEL=TRACE
[1710151844.459868] [m201:4904 :0]          ucp_ep.c:362  UCX  DEBUG created ep 0x7fb3e4a42000 to <no debug data> from api call
[1710151844.459908] [m201:4904 :0]      ucp_worker.c:1783 UCX  INFO    ep_cfg[0]: tag(ud_verbs/mlx5_1:1)
[1710151844.459927] [m201:4904 :0]          wireup.c:1192 UCX  DEBUG   ep 0x7fb3e4a42000: am_lane 0 wireup_msg_lane 0 cm_lane <none> keepalive_lane <none> reachable_mds 0x1
[1710151844.459934] [m201:4904 :0]          wireup.c:1215 UCX  DEBUG   ep 0x7fb3e4a42000: lane[0]:  0:ud_verbs/mlx5_1:1.0 md[0]     -> addr[0].md[0]/ib/sysdev[255] am am_bw#0 wireup
[1710151844.459942] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.459966] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.459974] [m201:4904 :0]           ud_ep.c:406  UCX  DEBUG   created ep ep=0x55c3ad550e10 iface=0x55c3ad59d5f0 id=0
[1710151844.459980] [m201:4904 :0]           ud_ep.c:510  UCX  DEBUG   mlx5_1:1/RoCE lid 0 qpn 0x922 epid 0 ep 0x55c3ad550e10 connected to IFACE ::ffff:10.2.0.51mtu 1024 pkey 0xffff  qpn 0x14
[1710151844.459985] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.459990] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.460265] [m201:4904 :0]           mpool.c:281  UCX  DEBUG   mpool ud_tx_skb: allocated chunk 0x7fb3e4800018 of 2097128 bytes with 1820 elements
[1710151844.460322] [m201:4904 :0]           async.c:231  UCX  DEBUG   added async handler 0x55c3ad598f70 [id=1000007 ref 1] ???() to hash
[1710151844.461249] [m201:4904 :a]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.461280] [m201:4904 :a]           ud_ep.c:821  UCX  DEBUG simultaneous CREQ ep=0x55c3ad550e10(iface=0x55c3ad59d5f0 conn_sn=0 ep_id=0, dest_ep_id=0 rx_psn=1)
[1710151844.461973] [m201:4904 :a]           mpool.c:281  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7fb3df800018 of 6291432 bytes with 5461 elements
[1710151844.498307] [m201:4904 :a]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.502378] [m201:4904 :0]           mpool.c:281  UCX  DEBUG mpool ucp_requests: allocated chunk 0x55c3ad634214 of 41044 bytes with 128 elements
[1710151844.502395] [m201:4904 :0]           flush.c:377  UCX  DEBUG flush_worker ep 0x7fb3e4a42000
[1710151844.502404] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 3 in flight: 1
[1710151844.502408] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151844.502412] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151844.537652] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.607868] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151844.607873] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151844.607875] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151844.616596] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151844.766065] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151844.766071] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151844.766074] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151844.774512] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151845.082456] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151845.082462] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151845.082464] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151845.090349] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151845.720036] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151845.720042] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151845.720045] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151845.726979] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151846.999981] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151846.999988] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151847.000003] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151847.005112] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151849.559874] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151849.559882] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151849.559897] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151849.561416] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151854.463944] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151854.463951] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151854.463954] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151854.609858] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151859.368026] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151859.368033] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151859.368048] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151859.658278] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151864.272083] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151864.272090] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151864.272106] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151864.706729] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151869.176178] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151869.176186] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151869.176189] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151869.755174] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151874.080282] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151874.080289] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151874.080292] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151874.803599] [m201:4904 :0]        ib_iface.c:755  UCX  DEBUG iface 0x55c3ad59d5f0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.51 sgid_index=3 traffic_class=106
[1710151878.984366] [m201:4904 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x55c3ad550e10 ca drop@cwnd = 2 in flight: 1
[1710151878.984376] [m201:4904 :0]           ud_ep.c:375  UCX  DEBUG ep 0x55c3ad550e10: timeout of 34.52 sec, config::peer_timeout - 30.00 sec
[1710151878.984392] [m201:4904 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x55c3ad550e10): resending rt_psn 2 rt_max_psn 2 acked_psn 1 max_psn 3 ack_req 1
[1710151878.984395] [m201:4904 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x55c3ad550e10): resending completed
[1710151878.984401] [m201:4904 :0]      ucp_worker.c:530  UCX  DEBUG worker 0x55c3ad5a2480: error handler called for UCT EP 0x55c3ad550e10: Endpoint timeout
[1710151878.984406] [m201:4904 :0]          ucp_ep.c:1410 UCX  DEBUG ep 0x7fb3e4a42000: set_ep_failed status Endpoint timeout on lane[0]=0x55c3ad550e10
[1710151878.984410] [m201:4904 :0]          ucp_ep.c:1373 UCX  DEBUG ep 0x7fb3e4a42000: discarding lanes
[1710151878.984414] [m201:4904 :0]          ucp_ep.c:1381 UCX  DEBUG ep 0x7fb3e4a42000: discard uct_ep[0]=0x55c3ad550e10
[1710151878.984420] [m201:4904 :0]          ucp_ep.c:1449 UCX  DIAG  ep 0x7fb3e4a42000: error 'Endpoint timeout' on ud_verbs/mlx5_1:1 will not be handled since no error callback is installed
[m201:4904 :0:4904]       ud_ep.c:278  Fatal: UD endpoint 0x55c3ad550e10 to <no debug data>: unhandled timeout error
==== backtrace (tid:   4904) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7fb3e5a40d74]
 1  /lib64/libucs.so.0(ucs_fatal_error_message+0xbc) [0x7fb3e5a40f2c]
 2  /lib64/libucs.so.0(ucs_fatal_error_format+0x111) [0x7fb3e5a41051]
 3  /lib64/ucx/libuct_ib.so.0(+0x57440) [0x7fb3e55cd440]
 4  /lib64/libucs.so.0(+0x18917) [0x7fb3e5a30917]
 5  /lib64/libucp.so.0(ucp_worker_progress+0x2a) [0x7fb3e5af5f6a]
 6  /lib64/libucp.so.0(ucp_worker_flush+0x48) [0x7fb3e5b13d08]
 7  ucx_perftest(+0xde16) [0x55c3acad9e16]
 8  ucx_perftest(+0x8bce) [0x55c3acad4bce]
 9  ucx_perftest(+0x5aa4) [0x55c3acad1aa4]
10  /lib64/libc.so.6(+0x3feb0) [0x7fb3e563feb0]
11  /lib64/libc.so.6(__libc_start_main+0x80) [0x7fb3e563ff60]
12  ucx_perftest(+0x63b5) [0x55c3acad23b5]
=================================
timeout: the monitored command dumped core
Aborted
[root@m201 ~]# 
[root@m051 ~]#  UCX_LOG_LEVEL=TRACE UCX_TLS=ud UCX_NET_DEVICES=irdma1:1 timeout 5m ucx_perftest -t tag_bw 10.2.0.201
[1710151844.415295] [m051:6032 :0]           debug.c:1146 UCX  DEBUG using signal stack 0x7f59f82ae000 size 141824
[1710151844.427592] [m051:6032 :0]            init.c:118  UCX  DEBUG /lib64/libucs.so.0 loaded at 0x7f59f843c000
[1710151844.427654] [m051:6032 :0]            init.c:120  UCX  DEBUG cmd line: ucx_perftest -t tag_bw 10.2.0.201 
[1710151844.427675] [m051:6032 :0]          module.c:72   UCX  DEBUG ucs library path: /lib64/libucs.so.0
[1710151844.427684] [m051:6032 :0]          module.c:280  UCX  DEBUG loading modules for ucs
[1710151844.427717] [m051:6032 :0]          module.c:280  UCX  DEBUG loading modules for ucx_perftest
[1710151844.427773] [m051:6032 :0]        perftest.c:921  UCX  WARN  CPU affinity is not set (bound to 40 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1710151844.428402] [m051:6032 :0]         libperf.c:213  UCX  DEBUG set send allocator by send mem type host
[1710151844.428422] [m051:6032 :0]         libperf.c:217  UCX  DEBUG set recv allocator by recv mem type host
[1710151844.428434] [m051:6032 :0]            time.c:22   UCX  DEBUG arch clock frequency: 1700000000.00 Hz
[1710151844.428526] [m051:6032 :0]     ucp_context.c:1969 UCX  INFO  Version 1.14.1 (loaded from /lib64/libucp.so.0)
[1710151844.428539] [m051:6032 :0]     ucp_context.c:1739 UCX  DEBUG estimated number of endpoints is 1
[1710151844.428545] [m051:6032 :0]     ucp_context.c:1746 UCX  DEBUG estimated number of endpoints per node is 1
[1710151844.428554] [m051:6032 :0]     ucp_context.c:1753 UCX  DEBUG estimated bcopy bandwidth is 6081740800.000000
[1710151844.428567] [m051:6032 :0]     ucp_context.c:1813 UCX  DEBUG allocation method[0] is md 'sysv'
[1710151844.428574] [m051:6032 :0]     ucp_context.c:1813 UCX  DEBUG allocation method[1] is md 'posix'
[1710151844.428583] [m051:6032 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[2] is 'huge'
[1710151844.428589] [m051:6032 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[3] is 'thp'
[1710151844.428595] [m051:6032 :0]     ucp_context.c:1813 UCX  DEBUG allocation method[4] is md '*'
[1710151844.428601] [m051:6032 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[5] is 'mmap'
[1710151844.428607] [m051:6032 :0]     ucp_context.c:1825 UCX  DEBUG allocation method[6] is 'heap'
[1710151844.428618] [m051:6032 :0]          module.c:280  UCX  DEBUG loading modules for uct
[1710151844.430623] [m051:6032 :0]          module.c:280  UCX  DEBUG loading modules for uct_ib
[1710151844.432401] [m051:6032 :0]     ucp_context.c:1445 UCX  DEBUG closing md self because it has no selected transport resources
[1710151844.432902] [m051:6032 :0]       uct_iface.c:905  UCX  DEBUG e810_roce_p2: PF sysfs path is '/sys/devices/pci0000:00/0000:00:02.0/0000:06:00.1'
[1710151844.432933] [m051:6032 :0]            topo.c:176  UCX  DEBUG added sys_dev 0 for bus id 06:00.1
[1710151844.432940] [m051:6032 :0]            topo.c:359  UCX  DEBUG e810_roce_p2: bdf_name 0000:06:00.1 sys_dev 0
[1710151844.433307] [m051:6032 :0]       uct_iface.c:905  UCX  DEBUG enp4s0f0: PF sysfs path is '/sys/devices/pci0000:00/0000:00:01.1/0000:04:00.0'
[1710151844.433321] [m051:6032 :0]            topo.c:176  UCX  DEBUG added sys_dev 1 for bus id 04:00.0
[1710151844.433327] [m051:6032 :0]            topo.c:359  UCX  DEBUG enp4s0f0: bdf_name 0000:04:00.0 sys_dev 1
[1710151844.433709] [m051:6032 :0]       uct_iface.c:905  UCX  DEBUG enp4s0f1: PF sysfs path is '/sys/devices/pci0000:00/0000:00:01.1/0000:04:00.1'
[1710151844.433723] [m051:6032 :0]            topo.c:176  UCX  DEBUG added sys_dev 2 for bus id 04:00.1
[1710151844.433745] [m051:6032 :0]            topo.c:359  UCX  DEBUG enp4s0f1: bdf_name 0000:04:00.1 sys_dev 2
[1710151844.434097] [m051:6032 :0]       uct_iface.c:901  UCX  DEBUG lo: sysfs path undetected
[1710151844.434107] [m051:6032 :0]            topo.c:363  UCX  DEBUG lo: system device unknown
[1710151844.434751] [m051:6032 :0]     ucp_context.c:1445 UCX  DEBUG closing md tcp because it has no selected transport resources
[1710151844.434815] [m051:6032 :0]     ucp_context.c:1445 UCX  DEBUG closing md sysv because it has no selected transport resources
[1710151844.434925] [m051:6032 :0]     ucp_context.c:1445 UCX  DEBUG closing md posix because it has no selected transport resources
[1710151844.435234] [m051:6032 :0]    ib_mlx5dv_md.c:1255 UCX  DEBUG irdma0: DEVX is not supported
[1710151844.435245] [m051:6032 :0]           ib_md.c:1734 UCX  DEBUG irdma0: md open by 'uct_ib_mlx5_devx_md_ops' failed, trying next
[1710151844.435251] [m051:6032 :0]           ib_md.c:1734 UCX  DEBUG irdma0: md open by 'uct_ib_mlx5_md_ops' failed, trying next
[1710151844.435744] [m051:6032 :0]       uct_iface.c:905  UCX  DEBUG irdma0: PF sysfs path is '/sys/devices/pci0000:00/0000:00:02.0/0000:06:00.0'
[1710151844.435773] [m051:6032 :0]            topo.c:176  UCX  DEBUG added sys_dev 3 for bus id 06:00.0
[1710151844.435780] [m051:6032 :0]            topo.c:359  UCX  DEBUG irdma0: bdf_name 0000:06:00.0 sys_dev 3
[1710151844.435816] [m051:6032 :0]       ib_device.c:495  UCX  DEBUG irdma0: vendor_id 0x8086 device_id 5522
[1710151844.435935] [m051:6032 :0]           async.c:231  UCX  DEBUG added async handler 0x5617c450c4d0 [id=5 ref 1] ???() to hash
[1710151844.436067] [m051:6032 :0]           async.c:493  UCX  DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1710151844.436078] [m051:6032 :0]       ib_device.c:593  UCX  DEBUG initialized device 'irdma0' (InfiniBand channel adapter) with 1 ports
[1710151844.436093] [m051:6032 :0]           ib_md.c:1788 UCX  DEBUG irdma0: cuda GPUDirect RDMA is disabled
[1710151844.436106] [m051:6032 :0]           ib_md.c:1788 UCX  DEBUG irdma0: rocm GPUDirect RDMA is disabled
[1710151844.436125] [m051:6032 :0]           ib_md.c:1815 UCX  DEBUG irdma0: dmabuf is supported
[1710151844.436136] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1710151844.436172] [m051:6032 :0]           async.c:231  UCX  DEBUG added async handler 0x5617c4514e40 [id=9 ref 1] ???() to hash
[1710151844.436186] [m051:6032 :0]           async.c:493  UCX  DEBUG listening to async event fd 9 events 0x1 mode thread_spinlock
[1710151844.436408] [m051:6032 :0]          module.c:280  UCX  DEBUG loading modules for ucm
[1710151844.436421] [m051:6032 :0]           ib_md.c:1500 UCX  DEBUG irdma0: using registration cache
[1710151844.436740] [m051:6032 :0]           ib_md.c:1727 UCX  DEBUG irdma0: md open by 'uct_ib_verbs_md_ops' is successful
[1710151844.436792] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query dc_mlx5 resources: No such device
[1710151844.437200] [m051:6032 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0x0
[1710151844.437211] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query rc_verbs resources: No such device
[1710151844.437217] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query rc_mlx5 resources: No such device
[1710151844.437238] [m051:6032 :0]       ib_device.c:1061 UCX  DEBUG no compatible IB ports found for flags 0x0
[1710151844.437244] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query ud_verbs resources: No such device
[1710151844.437251] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query ud_mlx5 resources: No such device
[1710151844.437257] [m051:6032 :0]     ucp_context.c:1003 UCX  DEBUG No tl resources found for md irdma0
[1710151844.437263] [m051:6032 :0]     ucp_context.c:1445 UCX  DEBUG closing md irdma0 because it has no selected transport resources
[1710151844.437287] [m051:6032 :0]           async.c:156  UCX  DEBUG removed async handler 0x5617c4514e40 [id=9 ref 1] ???() from hash
[1710151844.437295] [m051:6032 :0]           async.c:546  UCX  DEBUG removing async handler 0x5617c4514e40 [id=9 ref 1] ???()
[1710151844.437305] [m051:6032 :0]           async.c:171  UCX  DEBUG release async handler 0x5617c4514e40 [id=9 ref 0] ???()
[1710151844.437320] [m051:6032 :0]         pgtable.c:618  UCX  DEBUG purge empty page table
[1710151844.437327] [m051:6032 :0]           mpool.c:194  UCX  DEBUG mpool rcache_mp destroyed
[1710151844.437335] [m051:6032 :0]       ib_device.c:614  UCX  DEBUG destroying ib device irdma0
[1710151844.437342] [m051:6032 :0]           async.c:156  UCX  DEBUG removed async handler 0x5617c450c4d0 [id=5 ref 1] ???() from hash
[1710151844.437349] [m051:6032 :0]           async.c:546  UCX  DEBUG removing async handler 0x5617c450c4d0 [id=5 ref 1] ???()
[1710151844.437466] [m051:6032 :0]           async.c:171  UCX  DEBUG release async handler 0x5617c450c4d0 [id=5 ref 0] ???()
[1710151844.437648] [m051:6032 :0]    ib_mlx5dv_md.c:1255 UCX  DEBUG irdma1: DEVX is not supported
[1710151844.437659] [m051:6032 :0]           ib_md.c:1734 UCX  DEBUG irdma1: md open by 'uct_ib_mlx5_devx_md_ops' failed, trying next
[1710151844.437665] [m051:6032 :0]           ib_md.c:1734 UCX  DEBUG irdma1: md open by 'uct_ib_mlx5_md_ops' failed, trying next
[1710151844.438119] [m051:6032 :0]       uct_iface.c:905  UCX  DEBUG irdma1: PF sysfs path is '/sys/devices/pci0000:00/0000:00:02.0/0000:06:00.1'
[1710151844.438147] [m051:6032 :0]            topo.c:359  UCX  DEBUG irdma1: bdf_name 0000:06:00.1 sys_dev 0
[1710151844.438183] [m051:6032 :0]       ib_device.c:495  UCX  DEBUG irdma1: vendor_id 0x8086 device_id 5522
[1710151844.438286] [m051:6032 :0]           async.c:231  UCX  DEBUG added async handler 0x5617c45150d0 [id=5 ref 1] ???() to hash
[1710151844.438399] [m051:6032 :0]           async.c:493  UCX  DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1710151844.438410] [m051:6032 :0]       ib_device.c:593  UCX  DEBUG initialized device 'irdma1' (InfiniBand channel adapter) with 1 ports
[1710151844.438424] [m051:6032 :0]           ib_md.c:1788 UCX  DEBUG irdma1: cuda GPUDirect RDMA is disabled
[1710151844.438436] [m051:6032 :0]           ib_md.c:1788 UCX  DEBUG irdma1: rocm GPUDirect RDMA is disabled
[1710151844.438453] [m051:6032 :0]           ib_md.c:1815 UCX  DEBUG irdma1: dmabuf is supported
[1710151844.438463] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1710151844.438483] [m051:6032 :0]           async.c:231  UCX  DEBUG added async handler 0x5617c4510050 [id=9 ref 1] ???() to hash
[1710151844.438496] [m051:6032 :0]           async.c:493  UCX  DEBUG listening to async event fd 9 events 0x1 mode thread_spinlock
[1710151844.438641] [m051:6032 :0]           ib_md.c:1500 UCX  DEBUG irdma1: using registration cache
[1710151844.439004] [m051:6032 :0]           ib_md.c:1727 UCX  DEBUG irdma1: md open by 'uct_ib_verbs_md_ops' is successful
[1710151844.439058] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query dc_mlx5 resources: No such device
[1710151844.439371] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query rc_mlx5 resources: No such device
[1710151844.439412] [m051:6032 :0]          uct_md.c:129  UCX  DEBUG failed to query ud_mlx5 resources: No such device
[1710151844.439489] [m051:6032 :0]     ucp_context.c:1445 UCX  DEBUG closing md cma because it has no selected transport resources
[1710151844.439503] [m051:6032 :0]     ucp_context.c:1478 UCX  DEBUG register host memory on: irdma1
[1710151844.439510] [m051:6032 :0]     ucp_context.c:1478 UCX  DEBUG register cuda memory on: 
[1710151844.439516] [m051:6032 :0]     ucp_context.c:1478 UCX  DEBUG register cuda-managed memory on: 
[1710151844.439521] [m051:6032 :0]     ucp_context.c:1478 UCX  DEBUG register rocm memory on: 
[1710151844.439527] [m051:6032 :0]     ucp_context.c:1478 UCX  DEBUG register rocm-managed memory on: 
[1710151844.439550] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1710151844.439705] [m051:6032 :0]     ucp_context.c:2038 UCX  DEBUG created ucp context 0x5617c44ff010 0x5617c44ff010 [1 mds 1 tls] features 0x1 tl bitmap 0x1 0x0
[1710151844.439722] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439729] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439786] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439810] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439820] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439926] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439936] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439943] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439967] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.439977] [m051:6032 :0]         uct_mem.c:293  UCX  DEBUG   could not allocate memory with any of the provided methods
[1710151844.440442] [m051:6032 :0]        ib_iface.c:869  UCX  DEBUG using pkey[0] 0xffff on irdma1:1/RoCE
[1710151844.440643] [m051:6032 :0]       ib_device.c:924  UCX  DEBUG irdma1:1 using gid_index 1
[1710151844.440710] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.441241] [m051:6032 :0]        ib_iface.c:1397 UCX  DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 1064
[1710151844.441816] [m051:6032 :0]        ib_iface.c:987  UCX  DEBUG iface=0x5617c4512bf0: created UD QP 0x14 on irdma1:1 TX wr:447 sge:6 inl:64 resp:0 RX wr:8191 sge:1 resp:0
[1710151844.441946] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 1116
[1710151844.442092] [m051:6032 :0]           mpool.c:281  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7f59f74b0018 of 151528 bytes with 131 elements
[1710151844.442108] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 1136
[1710151844.442150] [m051:6032 :0]        ud_iface.c:380  UCX  DEBUG iface 0x5617c4512bf0: adding gid fe80::b696:91ff:feb3:5ca9 to hash on device irdma1 port 1 index 0)
[1710151844.442184] [m051:6032 :0]        ud_iface.c:380  UCX  DEBUG iface 0x5617c4512bf0: adding gid ::ffff:10.2.0.51 to hash on device irdma1 port 1 index 1)
[1710151844.442214] [m051:6032 :0]        ud_iface.c:380  UCX  DEBUG iface 0x5617c4512bf0: adding gid fe80::478c:3987:7ddf:ac03 to hash on device irdma1 port 1 index 2)
[1710151844.442817] [m051:6032 :0]     timer_wheel.c:40   UCX  DEBUG high res timer created log=23 resolution=4934.475294 usec wanted: 2500.000000 usec
[1710151844.442837] [m051:6032 :0]           async.c:231  UCX  DEBUG added async handler 0x5617c45152c0 [id=11 ref 1] ???() to hash
[1710151844.442867] [m051:6032 :0]           async.c:493  UCX  DEBUG listening to async event fd 11 events 0x5 mode thread_spinlock
[1710151844.443009] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443020] [m051:6032 :0]      ucp_worker.c:1328 UCX  DEBUG created interface[0]=0x5617c4512bf0 using ud_verbs/irdma1:1 on worker 0x5617c45312f0
[1710151844.443028] [m051:6032 :0]      ucp_worker.c:1131 UCX  DEBUG selected scalable tl bitmap: 0x1 0x0 (1 tls)
[1710151844.443077] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443121] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443157] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443194] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443231] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443267] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443304] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443340] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443387] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443425] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443462] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443499] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443536] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.443824] [m051:6032 :0]           async.c:231  UCX  DEBUG added async handler 0x5617c4504120 [id=12 ref 1] ???() to hash
[1710151844.443844] [m051:6032 :0]           async.c:493  UCX  DEBUG listening to async event fd 12 events 0x1 mode thread_spinlock
[1710151844.443852] [m051:6032 :0]       rdmacm_cm.c:974  UCX  DEBUG created rdmacm_cm 0x5617c4513500 with event_channel 0x5617c45154b0 (fd=12)
[1710151844.443871] [m051:6032 :0]      tcp_sockcm.c:223  UCX  DEBUG created tcp_sockcm 0x5617c44fbf60
[1710151844.443881] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ucp_requests: align 64, maxelems 4294967295, elemsize 280
[1710151844.443888] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ucp_rkeys: align 64, maxelems 4294967295, elemsize 104
[1710151844.443895] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ucp_reg_bufs: align 64, maxelems 4294967295, elemsize 8216
[1710151844.443902] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 153
[1710151844.443909] [m051:6032 :0]           mpool.c:136  UCX  DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 1105
[1710151844.443916] [m051:6032 :0]       mpool_set.c:129  UCX  DEBUG mpool_set:ucp_am_bufs, sizes map 0x80000040, largest size 1016, mpools num 2
[1710151844.443984] [m051:6032 :0]          parser.c:1998 UCX  INFO  UCX_* env variables: UCX_TLS=ud UCX_NET_DEVICES=irdma1:1 UCX_LOG_LEVEL=TRACE
[1710151844.466424] [m051:6032 :0]          ucp_ep.c:362  UCX  DEBUG created ep 0x7f59f7490000 to <no debug data> from api call
[1710151844.466554] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.466569] [m051:6032 :0]      ucp_worker.c:1783 UCX  INFO    ep_cfg[0]: tag(ud_verbs/irdma1:1)
[1710151844.466580] [m051:6032 :0]          wireup.c:1192 UCX  DEBUG   ep 0x7f59f7490000: am_lane 0 wireup_msg_lane 0 cm_lane <none> keepalive_lane <none> reachable_mds 0x1
[1710151844.466590] [m051:6032 :0]          wireup.c:1215 UCX  DEBUG   ep 0x7f59f7490000: lane[0]:  0:ud_verbs/irdma1:1.0 md[0]     -> addr[0].md[0]/ib/sysdev[255] am am_bw#0 wireup
[1710151844.466603] [m051:6032 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x5617c4512bf0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.201 sgid_index=1 traffic_class=106
[1710151844.466756] [m051:6032 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x5617c4512bf0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.201 sgid_index=1 traffic_class=106
[1710151844.466771] [m051:6032 :0]           ud_ep.c:406  UCX  DEBUG   created ep ep=0x5617c4514aa0 iface=0x5617c4512bf0 id=0
[1710151844.466781] [m051:6032 :0]           ud_ep.c:510  UCX  DEBUG   irdma1:1/RoCE lid 1 qpn 0x14 epid 0 ep 0x5617c4514aa0 connected to IFACE ::ffff:10.2.0.201mtu 1024 pkey 0xffff  qpn 0x922
[1710151844.466790] [m051:6032 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x5617c4512bf0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.201 sgid_index=1 traffic_class=106
[1710151844.466800] [m051:6032 :0]        ib_iface.c:755  UCX  DEBUG   iface 0x5617c4512bf0: ah_attr dlid=49152 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.2.0.201 sgid_index=1 traffic_class=106
[1710151844.467430] [m051:6032 :0]           mpool.c:281  UCX  DEBUG   mpool ud_tx_skb: allocated chunk 0x7f59f7200018 of 2097128 bytes with 1820 elements
[1710151844.467553] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467594] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467632] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467668] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467705] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467748] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467785] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467821] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467858] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467894] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467930] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.467966] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.468003] [m051:6032 :0]       ib_device.c:1295 UCX  DEBUG   RoCE LAG level on irdma1:1 (e810_roce_p2) is 1
[1710151844.468024] [m051:6032 :0]           async.c:231  UCX  DEBUG   added async handler 0x5617c44fc810 [id=1000006 ref 1] ???() to hash
[1710151844.469623] [m051:6032 :a]           mpool.c:281  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7f59f6a00018 of 6291432 bytes with 5461 elements
[1710151844.469805] [m051:6032 :0]           mpool.c:281  UCX  DEBUG mpool ucp_requests: allocated chunk 0x5617c4564054 of 41044 bytes with 128 elements
[1710151844.469827] [m051:6032 :0]           flush.c:377  UCX  DEBUG flush_worker ep 0x7f59f7490000
[1710151844.504398] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151844.504418] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151844.504437] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151844.543878] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151844.543887] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151844.543906] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151844.622835] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151844.622844] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151844.622849] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151844.780752] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151844.780760] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151844.780765] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151845.096589] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151845.096598] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151845.096603] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151845.733204] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151845.733223] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151845.733242] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151847.011345] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151847.011355] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151847.011378] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151849.567643] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151849.567659] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151849.567679] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151854.616084] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151854.616115] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151854.616121] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151859.664507] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151859.664535] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151859.664541] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151864.712957] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151864.712973] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151864.712992] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151869.761395] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151869.761418] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151869.761437] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151874.809816] [m051:6032 :0]           ud_ep.c:93   UCX  DEBUG ep: 0x5617c4514aa0 ca drop@cwnd = 2 in flight: 1
[1710151874.809847] [m051:6032 :0]           ud_ep.c:375  UCX  DEBUG ep 0x5617c4514aa0: timeout of 30.34 sec, config::peer_timeout - 30.00 sec
[1710151874.809858] [m051:6032 :0]           ud_ep.c:1427 UCX  DEBUG ep(0x5617c4514aa0): resending rt_psn 1 rt_max_psn 1 acked_psn 0 max_psn 2 ack_req 1
[1710151874.809863] [m051:6032 :0]           ud_ep.c:1433 UCX  DEBUG ep(0x5617c4514aa0): resending completed
[1710151874.809875] [m051:6032 :0]      ucp_worker.c:530  UCX  DEBUG worker 0x5617c45312f0: error handler called for UCT EP 0x5617c4514aa0: Endpoint timeout
[1710151874.809886] [m051:6032 :0]          ucp_ep.c:1410 UCX  DEBUG ep 0x7f59f7490000: set_ep_failed status Endpoint timeout on lane[0]=0x5617c4514aa0
[1710151874.809894] [m051:6032 :0]          ucp_ep.c:1373 UCX  DEBUG ep 0x7f59f7490000: discarding lanes
[1710151874.809914] [m051:6032 :0]          ucp_ep.c:1381 UCX  DEBUG ep 0x7f59f7490000: discard uct_ep[0]=0x5617c4514aa0
[1710151874.809928] [m051:6032 :0]          ucp_ep.c:1449 UCX  DIAG  ep 0x7f59f7490000: error 'Endpoint timeout' on ud_verbs/irdma1:1 will not be handled since no error callback is installed
[m051:6032 :0:6032]       ud_ep.c:278  Fatal: UD endpoint 0x5617c4514aa0 to <no debug data>: unhandled timeout error
==== backtrace (tid:   6032) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f59f8464d74]
 1  /lib64/libucs.so.0(ucs_fatal_error_message+0xbc) [0x7f59f8464f2c]
 2  /lib64/libucs.so.0(ucs_fatal_error_format+0x111) [0x7f59f8465051]
 3  /lib64/ucx/libuct_ib.so.0(+0x57440) [0x7f59f7fcd440]
 4  /lib64/libucs.so.0(+0x18917) [0x7f59f8454917]
 5  /lib64/libucp.so.0(ucp_worker_progress+0x2a) [0x7f59f8519f6a]
 6  /lib64/libucp.so.0(ucp_worker_flush+0x48) [0x7f59f8537d08]
 7  ucx_perftest(+0xde16) [0x5617c24e5e16]
 8  ucx_perftest(+0x8bce) [0x5617c24e0bce]
 9  ucx_perftest(+0x5aa4) [0x5617c24ddaa4]
10  /lib64/libc.so.6(+0x3feb0) [0x7f59f803feb0]
11  /lib64/libc.so.6(__libc_start_main+0x80) [0x7f59f803ff60]
12  ucx_perftest(+0x63b5) [0x5617c24de3b5]
=================================
timeout: the monitored command dumped core
Aborted
[root@m051 ~]# 
yosefe commented 8 months ago

it seems like there is a connectivity issue on the cluster, does basic ib_send_bw test work in this environment?

Honggang-LI commented 8 months ago
[root@m201 ~]# ib_send_bw -c UD -d mlx5_1  -F -a -b

************************************
* Waiting for client to connect... *
************************************
 Max msg size in UD is MTU 1024
 Changing to this MTU
---------------------------------------------------------------------------------------
                    Send Bidirectional BW Test
 Dual-port       : OFF      Device         : mlx5_1
 Number of qps   : 1        Transport type : IB
 Connection type : UD       Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 RX depth        : 1000
 CQ Moderation   : 100
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0928 PSN 0x136bae
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:02:00:201
 remote address: LID 0x01 QPN 0x0018 PSN 0xd8117e
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:02:00:51
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             23.25              20.70          10.851089
 4          1000             45.07              43.06          11.286887
 8          1000             88.78              85.00          11.141716
 16         1000             183.49             172.32         11.293055
 32         1000             360.85             340.14         11.145629
 64         1000             729.59             693.38         11.360355
 128        1000             1458.41            1334.46        10.931875
 256        1000             2862.08            2721.39        11.146794
 512        1000             4929.65            4299.06        8.804469
 1024       1000             7904.25            7357.28        7.533854
---------------------------------------------------------------------------------------
[root@m201 ~]# 
[root@m051 ~]# ib_send_bw -c UD -d irdma1 -F -a -b 10.2.0.201 
 Max msg size in UD is MTU 1024
 Changing to this MTU
---------------------------------------------------------------------------------------
                    Send Bidirectional BW Test
 Dual-port       : OFF      Device         : irdma1
 Number of qps   : 1        Transport type : IB
 Connection type : UD       Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 RX depth        : 1000
 CQ Moderation   : 100
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x01 QPN 0x0018 PSN 0xd8117e
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:02:00:51
 remote address: LID 0000 QPN 0x0928 PSN 0x136bae
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:02:00:201
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             23.25              20.70          10.851089
 4          1000             45.07              43.06          11.286887
 8          1000             88.78              85.00          11.141716
 16         1000             183.49             172.32         11.293055
 32         1000             360.85             340.14         11.145629
 64         1000             729.59             693.38         11.360355
 128        1000             1458.41            1334.46        10.931875
 256        1000             2862.08            2721.39        11.146794
 512        1000             4929.65            4299.06        8.804469
 1024       1000             7904.25            7357.28        7.533854
---------------------------------------------------------------------------------------
[root@m051 ~]# 
changchengx commented 8 months ago

irdmaxx maybe the HCA from Intel

Honggang-LI commented 8 months ago

irdmaxx maybe the HCA from Intel

yes, irdmaxx is e810 from intel.

redsox38 commented 3 months ago

Was there ever a resolution to this? I have this exact same issue with ucx/roce on a Broadcom BCM57414. UCX fails with that same timeout, but ib_send_bw works fine.

yosefe commented 3 months ago

AFAIK, UCX/roce was not tested on a Broadcom HW. I'd suggest checking with Broadcom about adding such support.

hjelmn commented 2 months ago

@yosefe We are seeing the same issue with another system using irdma. ib_send_bw works with UD but UCX just hits the timeout. I can probably spend some time in the next month debugging.

hjelmn commented 2 months ago

On our system the ibv_wc.sl is not set correctly so it is assuming the grh is IPv6. Forcing it to treat it like IPv4 works. I am still investigating why the service level is not set correctly but the irdma driver is likely to blame. I am looking at how to work around this in the code but it looks like setting UCX_ETH_DGID_CHECK=n is a good temporary solution (at least in my case). @Honggang-LI Can you give that a try? If it works for you as well I think Intel's driver is incorrect. I think setting sl is required... Looking into that.

hjelmn commented 2 months ago

10209 fixes iRDMA for me with UD.

hjelmn commented 2 months ago

Looking at the iRDMA provider it never sets the work completion service level. That is why UD does not working with iRDMA. My PR is likely the correct fix.

Honggang-LI commented 5 days ago

Following test based on commit 9029d9ca5c8e743f956b5c16e3f0e892c94ec48a .

Thanks for the fix.

[root@m161 ~]$ UCX_TLS=ud UCX_NET_DEVICES=mlx5_1:1 timeout 5m /tmp/ucx/bin/ucx_perftest -t tag_bw
[1732849777.425595] [m161:11617:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 8 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.2.0.51:56186
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         tag match bandwidth                                                                        |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 8                                                                                          |
| Window size:  32                                                                                         |
+----------------------------------------------------------------------------------------------------------+
[root@m161 ~]$ 
[root@m051 ~]$ UCX_TLS=ud UCX_NET_DEVICES=irdma1:1 timeout 5m /tmp/ucx/bin/ucx_perftest -t tag_bw 10.2.0.161
[1732849788.577876] [m051:137956:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 24 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
Final:               1000000      0.228     0.703     0.703       10.85      10.85     1421816     1421816