openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.14k stars 423 forks source link

OMB Collectives Fail with UCX+ROCm #6207

Open nhanford opened 3 years ago

nhanford commented 3 years ago

OpenMPI and UCX Issue for UCX Github

Describe the bug

When following the UCX+ROCm tutorial here, intra-node collectives, such as osu_allreduce -d rocm, fail with the following message:

/opt/ucx-1.10.0/lib/ucx/libuct_rocm_gdr.so.0: undefined symbol: gdr_copy_to_bar

and inter-node collectives fail with the following message:

select.c:450  UCX  ERROR no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, rocm_copy/rocm_cpy - no am bcopy, rocm_ipc/rocm_ipc - no am bcopy, rocm_gdr/rocm_gdr - no am bcopy, cma/memory - no am bcopy
pml_ucx.c:385  Error: ucp_ep_create(proc=8) failed: Destination is unreachable

Interestingly, the OSU pt2pt benchmarks work with the D D arguments. Furthermore, I have confirmed that the check_large_bar sanity check from the given tutorial works on this system.

We would be really interested in getting this working in order to add OMPI+UCX to our MPIs on this system.

Thanks!

Steps to Reproduce

Setup and versions

amdgpu               5901604  332 
amd_iommu_v2           18821  1 amdgpu
amd_sched              33495  1 amdgpu
amdttm                 96760  1 amdgpu
amdkcl                 30354  2 amdgpu,amdttm
drm_kms_helper        186609  3 ast,amdgpu,amdkcl
drm                   460301  276 ast,ttm,drm_kms_helper,amd_sched,amdgpu,amdkcl,amdttm
i2c_algo_bit           13413  3 ast,igb,amdgpu

Additional information (depending on the issue)

#
# Memory domain: posix
#     Component: posix
#             allocate: unlimited
#           remote key: 24 bytes
#           rkey_ptr is supported
#
#      Transport: posix
#         Device: memory
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 12179.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: sysv
#     Component: sysv
#             allocate: unlimited
#           remote key: 12 bytes
#           rkey_ptr is supported
#
#      Transport: sysv
#         Device: memory
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 12179.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: self
#     Component: self
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#
#      Transport: self
#         Device: memory
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 6911.00 MB/sec
#              latency: 0 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 8K
#             am_bcopy: <= 8K
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: tcp
#     Component: tcp
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#
#      Transport: tcp
#         Device: enp3s0f0
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 113.16/ppn + 0.00 MB/sec
#              latency: 5776 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 16 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure
#
#      Transport: tcp
#         Device: hsi0
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 11.90/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 0
#     device num paths: 1
#              max eps: 256
#       device address: 16 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure
#
#
# Connection manager: tcp
#      max_conn_priv: 2032 bytes
#
# Memory domain: sockcm
#     Component: sockcm
#           supports client-server connection establishment via sockaddr
#   < no supported devices found >
#
# Memory domain: mlx5_0
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#
#      Transport: rc_verbs
#         Device: mlx5_0:1
#  System device: 0000:33:00.0 (0)
#
#      capabilities:
#            bandwidth: 13923.72/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 75 nsec
#            put_short: <= 124
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 8 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 8 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 123
#             am_bcopy: <= 8255
#             am_zcopy: <= 8255, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 127
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 17 bytes
#       error handling: peer failure
#
#
#      Transport: rc_mlx5
#         Device: mlx5_0:1
#  System device: 0000:33:00.0 (0)
#
#      capabilities:
#            bandwidth: 13923.72/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 14 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 14 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 186
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 7 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: dc_mlx5
#         Device: mlx5_0:1
#  System device: 0000:33:00.0 (0)
#
#      capabilities:
#            bandwidth: 13923.72/ppn + 0.00 MB/sec
#              latency: 660 nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 11 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 11 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 138
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 5 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: ud_verbs
#         Device: mlx5_0:1
#  System device: 0000:33:00.0 (0)
#
#      capabilities:
#            bandwidth: 13923.72/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 105 nsec
#             am_short: <= 116
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 3952
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
#      Transport: ud_mlx5
#         Device: mlx5_0:1
#  System device: 0000:33:00.0 (0)
#
#      capabilities:
#            bandwidth: 13923.72/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 80 nsec
#             am_short: <= 180
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 132
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
#      Transport: cm
#         Device: mlx5_0:1
#  System device: 0000:33:00.0 (0)
#
#      capabilities:
#            bandwidth: 13923.72/ppn + 0.00 MB/sec
#              latency: 600 nsec
#             overhead: 1200 nsec
#             am_bcopy: <= 214
#           connection: to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 4 bytes
#       error handling: none
#
#
# Memory domain: rdmacm
#     Component: rdmacm
#           supports client-server connection establishment via sockaddr
#   < no supported devices found >
#
# Connection manager: rdmacm
#      max_conn_priv: 54 bytes
#
# Memory domain: rocm_cpy
#     Component: rocm_cpy
#             register: unlimited, cost: 0 nsec
#           remote key: 16 bytes
#
#      Transport: rocm_copy
#         Device: rocm_cpy
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 6911.00 MB/sec
#              latency: 10000 nsec
#             overhead: 0 nsec
#            put_short: <= 4294967295
#            put_zcopy: unlimited, up to 1 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_short: <= 4294967295
#            get_zcopy: unlimited, up to 1 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: rocm_ipc
#     Component: rocm_ipc
#             register: unlimited, cost: 9 nsec
#           remote key: 56 bytes
#
#      Transport: rocm_ipc
#         Device: rocm_ipc
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 10240.00 MB/sec
#              latency: 80 nsec
#             overhead: 400 nsec
#            put_zcopy: unlimited, up to 1 iov
#  put_opt_zcopy_align: <= 4
#        put_align_mtu: <= 4
#            get_zcopy: unlimited, up to 1 iov
#  get_opt_zcopy_align: <= 4
#        get_align_mtu: <= 4
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: none
#
#
# Memory domain: rocm_gdr
#     Component: rocm_gdr
#             register: unlimited, cost: 0 nsec
#           remote key: 4 bytes
#
#      Transport: rocm_gdr
#         Device: rocm_gdr
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 6911.00/ppn + 0.00 MB/sec
#              latency: 1000 nsec
#             overhead: 0 nsec
#            put_short: <= 4294967295
#            get_short: <= 4294967295
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: cma
#     Component: cma
#             register: unlimited, cost: 9 nsec
#
#      Transport: cma
#         Device: memory
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 400 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: none
#
corona152
corona152
corona152
corona152
Warning: OMB could not identify the local rank of the process.
         This can lead to multiple processes using the same GPU.
         Please use the get_local_rank script in the OMB repo for this.
Warning: OMB could not identify the local rank of the process.
         This can lead to multiple processes using the same GPU.
         Please use the get_local_rank script in the OMB repo for this.
Warning: OMB could not identify the local rank of the process.
         This can lead to multiple processes using the same GPU.
         Please use the get_local_rank script in the OMB repo for this.
Warning: OMB could not identify the local rank of the process.
         This can lead to multiple processes using the same GPU.
         Please use the get_local_rank script in the OMB repo for this.
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   corona152
  Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   corona152
  Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   corona152
  Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   corona152
  Local device: mlx5_0
--------------------------------------------------------------------------

...

[1611610759.881867] [corona152:22952:0]  ucp_request.inl:165  UCX  REQ   completing send request 0x7fffffffa920 (0x7fffffffaa20) ------- Success
[1611610759.881870] [corona152:22952:0]       tag_send.c:253  UCX  REQ   send_nbx buffer (nil) count 0 tag fffff00000000000 to <no debug data>
[1611610759.881872] [corona152:22952:0]       tag_send.c:84   UCX  REQ   select tag request(0x7fffffffa920) progress algorithm datatype=0x8 buffer=(nil) length=0 mem_type:host max_short=92 rndv_thresh=262144 zcopy_thresh=262144 zcopy_enabled=0
[1611610759.881875] [corona152:22952:0]          mm_ep.c:280  UCX  DATA  TX: AM_SHORT am_id 2 len 8 EGR_O tag fffff00000000000
[1611610759.881881] [corona152:22952:0]          mm_ep.c:117  UCX  TRACE sent wakeup from socket 29 to 0xa90198
[1611610759.881882] [corona152:22952:0]  ucp_request.inl:165  UCX  REQ   completing send request 0x7fffffffa920 (0x7fffffffaa20) ------- Success
[1611610759.881884] [corona152:22952:0]       tag_send.c:253  UCX  REQ   send_nbx buffer (nil) count 0 tag fffff00000000000 to <no debug data>
[1611610759.881887] [corona152:22952:0]       tag_send.c:84   UCX  REQ   select tag request(0x7fffffffa920) progress algorithm datatype=0x8 buffer=(nil) length=0 mem_type:host max_short=92 rndv_thresh=262144 zcopy_thresh=262144 zcopy_enabled=0
[1611610759.881889] [corona152:22952:0]          mm_ep.c:280  UCX  DATA  TX: AM_SHORT am_id 2 len 8 EGR_O tag fffff00000000000
[1611610759.881895] [corona152:22952:0]          mm_ep.c:117  UCX  TRACE sent wakeup from socket 29 to 0xa902c8
[1611610759.881897] [corona152:22952:0]  ucp_request.inl:165  UCX  REQ   completing send request 0x7fffffffa920 (0x7fffffffaa20) ------- Success
[1611610759.881949] [corona152:22952:0]       tag_recv.c:218  UCX  REQ   allocated request 0xaedbc0
[1611610759.881951] [corona152:22952:0]       tag_recv.c:40   UCX  REQ   req 0xaedbc0: recv_nbx buffer 0x2aabcb200000 dt 0x8 count 4 tag fffff40000100000/ffffffffffffffff
[1611610759.881953] [corona152:22952:0]       tag_recv.c:128  UCX  REQ   recv_nbx returning expected request 0xaedbc0 (0xaedcc0)
[1611610759.881955] [corona152:22952:0]       tag_send.c:253  UCX  REQ   send_nbx buffer 0xadfcd0 count 4 tag fffff40000000000 to <no debug data>
[1611610759.881957] [corona152:22952:0]       tag_send.c:84   UCX  REQ   select tag request(0x7fffffffa6a0) progress algorithm datatype=0x8 buffer=0xadfcd0 length=4 mem_type:host max_short=92 rndv_thresh=262144 zcopy_thresh=262144 zcopy_enabled=0
[1611610759.881960] [corona152:22952:0]          mm_ep.c:280  UCX  DATA  TX: AM_SHORT am_id 2 len 12 EGR_O tag fffff40000000000
[1611610759.881961] [corona152:22952:0]  ucp_request.inl:165  UCX  REQ   completing send request 0x7fffffffa6a0 (0x7fffffffa7a0) ------- Success
[1611610759.881963] [corona152:22952:0]       mm_iface.c:232  UCX  DATA  RX: AM_SHORT am_id 2 len 12 EGR_O tag fffff40000100000
[1611610759.881965] [corona152:22952:0]    tag_match.inl:119  UCX  DATA  checking req 0xaedbc0 tag fffff40000100000/ffffffffffffffff with tag fffff40000100000
[1611610759.881967] [corona152:22952:0]    tag_match.inl:121  UCX  REQ   matched received tag fffff40000100000 to req 0xaedbc0
[1611610759.881969] [corona152:22952:0]      eager_rcv.c:25   UCX  REQ   found req 0xaedbc0
[1611610759.881971] [corona152:22952:0]  ucp_request.inl:547  UCX  REQ   req 0xaedbc0: unpack recv_data req_len 4 data_len 4 offset 0 last: yes
${HOME}/opt/osumb/bin/corona/openmpi/collective/osu_allreduce: symbol lookup error: ${HOME}/corona/opt/ucx-1.10.0/lib/ucx/libuct_rocm_gdr.so.0: undefined symbol: gdr_copy_to_bar
[1611610759.881861] [corona152:22953:0]    tag_match.inl:119  UCX  DATA  checking req 0x7fffffffa920 tag fffff00000000000/ffffffffffffffff with tag fffff00000000000
${HOME}/opt/osumb/bin/corona/openmpi/collective/osu_allreduce: symbol lookup error: ${HOME}/corona/opt/ucx-1.10.0/lib/ucx/libuct_rocm_gdr.so.0: undefined symbol: gdr_copy_to_bar
[1611610759.881881] [corona152:22954:0]    tag_match.inl:119  UCX  DATA  checking req 0x7fffffffa920 tag fffff00000000000/ffffffffffffffff with tag fffff00000000000
${HOME}/opt/osumb/bin/corona/openmpi/collective/osu_allreduce: symbol lookup error: ${HOME}/corona/opt/ucx-1.10.0/lib/ucx/libuct_rocm_gdr.so.0: undefined symbol: gdr_copy_to_bar
[1611610759.881892] [corona152:22955:0]       mm_iface.c:232  UCX  DATA  RX: AM_SHORT am_id 2 len 8 EGR_O tag fffff00000000000
${HOME}/opt/osumb/bin/corona/openmpi/collective/osu_allreduce: symbol lookup error: ${HOME}/corona/opt/ucx-1.10.0/lib/ucx/libuct_rocm_gdr.so.0: undefined symbol: gdr_copy_to_bar
[1611610759.881985] [corona152:22952:0]         ucp_mm.c:122  UCX  TRACE registered address 0x2aabcb200000 length 4 on md[4] memh[0]=0xadf[1611610759.881896] [corona152:22955:0]    tag_match.srun: error: corona152: tasks 0-3: Exited with exit code 127
nhanford commented 3 years ago

This appears to be related to Issue #4489. A temporary workaround for ROCm is to insert --without-gdrcopy in your config line so that the rocm_gdr transport doesn't get built in the first place.