spdk / spdk

Storage Performance Development Kit
https://spdk.io/
Other
3k stars 1.18k forks source link

[nvmf_shutdown_tc3] nvmf_tgt hit by SIGABRT while being killed during active IO (Assertion `wc[i].opcode == IBV_WC_RDMA_READ' failed.) #3214

Open mikeBashStuff opened 8 months ago

mikeBashStuff commented 8 months ago

Sighting report

https://ci.spdk.io/results/autotest-nightly/builds/3078/archive/nvmf-cvl-phy-autotest_1289/index.html

Point of failure:

00:16:53.840  nvmf_tgt: rdma.c:4701: nvmf_rdma_poller_poll: Assertion `wc[i].opcode == IBV_WC_RDMA_READ' failed.
00:16:54.413  [2023-12-16 02:17:30.940995] nvme_rdma.c: 615:nvme_rdma_validate_cm_event: *ERROR*: Expected RDMA_CM_EVENT_DISCONNECTED but received RDMA_CM_EVENT_TIMEWAIT_EXIT (15) from CM event channel (status = 0)
00:16:54.413  [2023-12-16 02:17:30.941600] bdev_nvme.c:1579:bdev_nvme_disconnected_qpair_cb: *NOTICE*: qpair 0x200019257340 was disconnected and freed. reset controller.
...
00:16:54.417  [2023-12-16 02:17:30.959623] nvme_ctrlr.c:1028:nvme_ctrlr_fail: *ERROR*: [nqn.2016-06.io.spdk:cnode6] in failed state.
00:16:54.417  [2023-12-16 02:17:30.969215] bdev_nvme.c:2823:bdev_nvme_failover_ctrlr_unsafe: *NOTICE*: Unable to perform failover, already in progress.
00:16:54.417  [2023-12-16 02:17:30.979251] bdev_nvme.c:2823:bdev_nvme_failover_ctrlr_unsafe: *NOTICE*: Unable to perform failover, already in progress.
...
00:16:55.459  [2023-12-16 02:17:31.955182] nvme_rdma.c:1161:nvme_rdma_connect_established: *ERROR*: RDMA connect error -99
00:16:55.459  [2023-12-16 02:17:31.955202] nvme_rdma.c:2721:nvme_rdma_qpair_process_completions: *ERROR*: Failed to connect rqpair=0x2000192ed100
...
00:16:56.400  [2023-12-16 02:17:32.971473] nvme_rdma.c:1161:nvme_rdma_connect_established: *ERROR*: RDMA connect error -99
00:16:56.401  [2023-12-16 02:17:32.971493] nvme_rdma.c:2721:nvme_rdma_qpair_process_completions: *ERROR*: Failed to connect rqpair=0x2000192c61c0
...
00:16:58.038  /var/jenkins/workspace/nvmf-cvl-phy-autotest/spdk/test/common/autotest_common.sh: line 940: 3950024 Aborted                 (core dumped) "${NVMF_APP[@]}" "$@"
00:16:58.038    02:17:34    -- common/autotest_common.sh@940 -- # trap - ERR
00:16:58.038    02:17:34    -- common/autotest_common.sh@940 -- # print_backtrace
00:16:58.038  [2023-12-16 02:17:34.634021] nvme_ctrlr.c:1638:nvme_ctrlr_disconnect: *NOTICE*: [nqn.2016-06.io.spdk:cnode10] resetting controller
00:16:58.038  [2023-12-16 02:17:34.634053] bdev_nvme.c:2027:_bdev_nvme_reset_ctrlr_complete: *ERROR*: Resetting controller failed.
00:16:58.038  [2023-12-16 02:17:34.634070] bdev_nvme.c:2027:_bdev_nvme_reset_ctrlr_complete: *ERROR*: Resetting controller failed.
00:16:58.038  [2023-12-16 02:17:34.634108] nvme_ctrlr.c:4000:nvme_ctrlr_process_init: *ERROR*: [nqn.2016-06.io.spdk:cnode2] Ctrlr is in error state
00:16:58.038  [2023-12-16 02:17:34.634126] nvme_ctrlr.c:1712:spdk_nvme_ctrlr_reconnect_poll_async: *ERROR*: [nqn.2016-06.io.spdk:cnode2] controller reinitialization failed
00:16:58.038  [2023-12-16 02:17:34.634140] nvme_ctrlr.c:1017:nvme_ctrlr_fail: *NOTICE*: [nqn.2016-06.io.spdk:cnode2] already in failed state
00:16:58.038    02:17:34    -- common/autotest_common.sh@1116 -- # [[ ehxBET =~ e ]]
00:16:58.038  [2023-12-16 02:17:34.634157] nvme_ctrlr.c:4000:nvme_ctrlr_process_init: *ERROR*: [nqn.2016-06.io.spdk:cnode1] Ctrlr is in error state
00:16:58.038  [2023-12-16 02:17:34.634187] nvme_ctrlr.c:1712:spdk_nvme_ctrlr_reconnect_poll_async: *ERROR*: [nqn.2016-06.io.spdk:cnode1] controller reinitialization failed
00:16:58.038  [2023-12-16 02:17:34.634200] nvme_ctrlr.c:1017:nvme_ctrlr_fail: *NOTICE*: [nqn.2016-06.io.spdk:cnode1] already in failed state
00:16:58.038  [2023-12-16 02:17:34.634251] nvme_ctrlr.c:1638:nvme_ctrlr_disconnect: *NOTICE*: [nqn.2016-06.io.spdk:cnode5] resetting controller
00:16:58.038  [2023-12-16 02:17:34.634289] bdev_nvme.c:2027:_bdev_nvme_reset_ctrlr_complete: *ERROR*: Resetting controller failed.
00:16:58.038  [2023-12-16 02:17:34.634305] bdev_nvme.c:2027:_bdev_nvme_reset_ctrlr_complete: *ERROR*: Resetting controller failed.
00:16:58.038    02:17:34    -- common/autotest_common.sh@1118 -- # args=('3950024' 'nvmf_shutdown_tc3' 'nvmf_shutdown_tc3' '--transport=rdma')
00:16:58.038    02:17:34    -- common/autotest_common.sh@1118 -- # local args
00:16:58.038    02:17:34    -- common/autotest_common.sh@1120 -- # xtrace_disable
00:16:58.038    02:17:34    -- common/autotest_common.sh@10 -- # set +x
00:16:58.038  ========== Backtrace start: ==========
00:16:58.038  
00:16:58.039  in /var/jenkins/workspace/nvmf-cvl-phy-autotest/spdk/test/common/autotest_common.sh:940 -> killprocess(["3950024"])
00:16:58.039       ...
00:16:58.039     935                kill $1
00:16:58.039     936            fi
00:16:58.039     937    
00:16:58.039     938            # wait for the process regardless if its the dummy sudo one
00:16:58.039     939            # or the actual app - it should terminate anyway
00:16:58.039  => 940            wait $1
00:16:58.039     941        else
00:16:58.039     942            # the process is not there anymore
00:16:58.039     943            echo "Process with pid $1 is not found"
00:16:58.039     944        fi
00:16:58.039     945    }
00:16:58.039       ...
00:16:58.039  in /var/jenkins/workspace/nvmf-cvl-phy-autotest/spdk/test/nvmf/target/shutdown.sh:134 -> nvmf_shutdown_tc3([])
00:16:58.039       ...
00:16:58.039     129        trap 'process_shm --id $NVMF_APP_SHM_ID; kill -9 $perfpid || true; nvmftestfini; exit 1' SIGINT SIGTERM EXIT
00:16:58.039     130    
00:16:58.039     131        waitforio /var/tmp/bdevperf.sock Nvme1n1
00:16:58.039     132    
00:16:58.039     133        # Kill the target half way through
00:16:58.039  => 134        killprocess $nvmfpid
00:16:58.039     135        nvmfpid=
00:16:58.039     136    
00:16:58.039     137        # Verify bdevperf exits successfully
00:16:58.039     138        sleep 1
00:16:58.039     139        # TODO: Right now the NVMe-oF initiator will not correctly detect broken connections
00:16:58.039       ...
00:16:58.039  in /var/jenkins/workspace/nvmf-cvl-phy-autotest/spdk/test/common/autotest_common.sh:1088 -> run_test(["nvmf_shutdown_tc3"],["nvmf_shutdown_tc3"])
00:16:58.039       ...
00:16:58.039     1083       timing_enter $test_name
00:16:58.039     1084       echo "************************************"
00:16:58.039     1085       echo "START TEST $test_name"
00:16:58.039     1086       echo "************************************"
00:16:58.039     1087       xtrace_restore
00:16:58.039     1088       time "$@"
00:16:58.039     1089       xtrace_disable
00:16:58.039     1090       echo "************************************"
00:16:58.039     1091       echo "END TEST $test_name"
00:16:58.039     1092       echo "************************************"
00:16:58.039     1093       timing_exit $test_name
00:16:58.039       ...
00:16:58.299  in /var/jenkins/workspace/nvmf-cvl-phy-autotest/spdk/test/nvmf/target/shutdown.sh:148 -> main(["--transport=rdma"])
00:16:58.299       ...
00:16:58.299     143        stoptarget
00:16:58.299     144    }
00:16:58.299     145    
00:16:58.299     146    run_test "nvmf_shutdown_tc1" nvmf_shutdown_tc1
00:16:58.299     147    run_test "nvmf_shutdown_tc2" nvmf_shutdown_tc2
00:16:58.299  => 148    run_test "nvmf_shutdown_tc3" nvmf_shutdown_tc3
00:16:58.299     149    
00:16:58.299     150    trap - SIGINT SIGTERM EXIT
00:16:58.299       ...
00:16:58.299  
00:16:58.299  ========== Backtrace end ==========

bt:

00:17:08.320  Core was generated by `/var/jenkins/workspace/nvmf-cvl-phy-autotest/spdk/build/bin/nvmf_tgt -i 0 -e 0x'.
00:17:08.320  Program terminated with signal SIGABRT, Aborted.
00:17:08.320  #0  0x00007fe6b9019884 in __pthread_kill_implementation () from /usr/lib64/libc.so.6
00:17:08.320  [Current thread is 1 (Thread 0x7fe6b5ea76c0 (LWP 3950030))]
00:17:08.320  
00:17:08.320  Thread 6 (Thread 0x7fe6b66a86c0 (LWP 3950029)):
00:17:08.320  #0  0x00007fe6b908c18d in write () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  #1  0x00007fe6b88a32e3 in rdma_disconnect (id=0x195c050) at /usr/src/debug/rdma-core-44.0-1.fc38.x86_64/librdmacm/cma.c:2044
00:17:08.320          cmd = {cmd = 10, in = 4, out = 0, id = 10}
00:17:08.320          id_priv = 0x195c050
00:17:08.320          ret = <optimized out>
00:17:08.320  #2  rdma_disconnect (id=0x195c050) at /usr/src/debug/rdma-core-44.0-1.fc38.x86_64/librdmacm/cma.c:2030
00:17:08.320          cmd = <optimized out>
00:17:08.320          id_priv = <optimized out>
00:17:08.320          ret = <optimized out>
00:17:08.320  #3  0x00007fe6b9fbb3ed in spdk_rdma_qp_disconnect (spdk_rdma_qp=0x7fe6a800fa50) at rdma_verbs.c:105
00:17:08.320          rc = 0
00:17:08.320          __PRETTY_FUNCTION__ = "spdk_rdma_qp_disconnect"
00:17:08.320          __func__ = "spdk_rdma_qp_disconnect"
00:17:08.320  #4  0x00007fe6ba55f3a2 in nvmf_rdma_close_qpair (qpair=0x195c300, cb_fn=0x7fe6ba4879be <_nvmf_transport_qpair_fini_complete>, cb_arg=0x7fe6a8094fc0) at rdma.c:4393
00:17:08.320          rqpair = 0x195c300
00:17:08.320  #5  0x00007fe6ba4c4066 in nvmf_transport_qpair_fini (qpair=0x195c300, cb_fn=0x7fe6ba4879be <_nvmf_transport_qpair_fini_complete>, cb_arg=0x7fe6a8094fc0) at transport.c:746
00:17:08.320  No locals.
00:17:08.320  #6  0x00007fe6ba489b21 in _nvmf_qpair_destroy (ctx=0x7fe6a8094fc0, status=0) at nvmf.c:1361
00:17:08.320          qpair_ctx = 0x7fe6a8094fc0
00:17:08.320          qpair = 0x195c300
00:17:08.320          ctrlr = 0x7fe6a825f050
00:17:08.320          sgroup = 0x7fe6a8000fb0
00:17:08.320          sid = 32742
00:17:08.320          __PRETTY_FUNCTION__ = "_nvmf_qpair_destroy"
00:17:08.320  #7  0x00007fe6ba40ebac in nvmf_qpair_request_cleanup (qpair=0x195c300) at ctrlr.c:4340
00:17:08.320          __PRETTY_FUNCTION__ = "nvmf_qpair_request_cleanup"
00:17:08.320  #8  0x00007fe6ba40f01e in spdk_nvmf_request_free (req=0x20001961f050) at ctrlr.c:4355
00:17:08.320          qpair = 0x195c300
00:17:08.320          __func__ = "spdk_nvmf_request_free"
00:17:08.320  #9  0x00007fe6ba4097c8 in nvmf_qpair_free_aer (qpair=0x195c300) at ctrlr.c:3884
00:17:08.320          ctrlr = 0x7fe6a825f050
00:17:08.320          i = 3
00:17:08.320          __PRETTY_FUNCTION__ = "nvmf_qpair_free_aer"
00:17:08.320  #10 0x00007fe6ba48a66c in spdk_nvmf_qpair_disconnect (qpair=0x195c300, cb_fn=0x0, ctx=0x0) at nvmf.c:1437
00:17:08.320          group = 0x7fe6a8000c00
00:17:08.320          qpair_ctx = 0x7fe6a8094fc0
00:17:08.320          __func__ = "spdk_nvmf_qpair_disconnect"
00:17:08.320          __PRETTY_FUNCTION__ = "spdk_nvmf_qpair_disconnect"
00:17:08.320  #11 0x00007fe6ba48d476 in nvmf_poll_group_remove_subsystem_msg (ctx=0x7fe6a8264df0) at nvmf.c:1735
00:17:08.320          qpair = 0x195c300
00:17:08.320          qpair_tmp = 0x1bcc320
00:17:08.320          subsystem = 0x1b9caf0
00:17:08.320          group = 0x7fe6a8000c00
00:17:08.320          qpair_ctx = 0x7fe6a8264df0
00:17:08.320          qpairs_found = true
00:17:08.320          rc = 0
00:17:08.320  #12 0x00007fe6ba48dad8 in nvmf_poll_group_remove_subsystem (group=0x7fe6a8000c00, subsystem=0x1b9caf0, cb_fn=0x7fe6ba45444a <subsystem_state_change_continue>, cb_arg=0x1d30e10) at nvmf.c:1784
00:17:08.320          sgroup = 0x7fe6a8000fb0
00:17:08.320          ctx = 0x7fe6a8264df0
00:17:08.320          i = 32
00:17:08.320          __func__ = "nvmf_poll_group_remove_subsystem"
00:17:08.320  #13 0x00007fe6ba4545d8 in subsystem_state_change_on_pg (i=0x1d30e10) at subsystem.c:659
00:17:08.320          ctx = 0x195d7d0
00:17:08.320          ch = 0x7fe6a8000ba0
00:17:08.320          group = 0x7fe6a8000c00
00:17:08.320          __PRETTY_FUNCTION__ = "subsystem_state_change_on_pg"
00:17:08.320  #14 0x00007fe6b9b510c6 in _call_channel (ctx=0x1d30e10) at thread.c:2552
00:17:08.320          i = 0x1d30e10
00:17:08.320          ch = 0x7fe6a8000ba0
00:17:08.320  #15 0x00007fe6b9b405e0 in msg_queue_run_batch (thread=0x194c280, max_msgs=8) at thread.c:848
00:17:08.320          msg = 0x2000040ef2c0
00:17:08.320          count = 1
00:17:08.320          i = 0
00:17:08.320          messages = {0x2000040ef2c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
00:17:08.320          notify = 1
00:17:08.320          rc = 0
00:17:08.320          __func__ = "msg_queue_run_batch"
00:17:08.320          __PRETTY_FUNCTION__ = "msg_queue_run_batch"
00:17:08.320  #16 0x00007fe6b9b43b73 in thread_poll (thread=0x194c280, max_msgs=0, now=1091428452571845) at thread.c:1070
00:17:08.320          msg_count = 254117
00:17:08.320          poller = 0x3e0a5fd36600f
00:17:08.320          tmp = 0x19492c0
00:17:08.320          critical_msg = 0x0
00:17:08.320          rc = 0
00:17:08.320  #17 0x00007fe6b9b44a0d in spdk_thread_poll (thread=0x194c280, max_msgs=0, now=1091428452571845) at thread.c:1163
00:17:08.320          orig_thread = 0x0
00:17:08.320          rc = 0
00:17:08.320  #18 0x00007fe6b9f7d2d7 in _reactor_run (reactor=0x19492c0) at reactor.c:914
00:17:08.320          thread = 0x194c280
00:17:08.320          lw_thread = 0x194c5c8
00:17:08.320          tmp = 0x0
00:17:08.320          now = 1091428452571845
00:17:08.320          rc = 0
00:17:08.320  #19 0x00007fe6b9f7d94b in reactor_run (arg=0x19492c0) at reactor.c:952
00:17:08.320          reactor = 0x19492c0
00:17:08.320          thread = 0x7fe6b908c1a1 <write+97>
00:17:08.320          lw_thread = 0x0
00:17:08.320          tmp = 0x0
00:17:08.320          thread_name = "reactor_3\000\000\000\000\000\000\000\203ij\266\346\177\000\000\001\000\000\000\000\000\000"
00:17:08.320          last_sched = 0
00:17:08.320          __func__ = "reactor_run"
00:17:08.320  #20 0x00007fe6b984ccc9 in eal_thread_loop (arg=0x3) at ../lib/eal/common/eal_common_thread.c:212
00:17:08.320          f = 0x7fe6b9f7d6cc <reactor_run>
00:17:08.320          fct_arg = 0x19492c0
00:17:08.320          lcore_id = 3
00:17:08.320          cpuset = "3", '\000' <repeats 254 times>
00:17:08.320          ret = 0
00:17:08.320  #21 0x00007fe6b986d979 in eal_worker_thread_loop (arg=0x3) at ../lib/eal/linux/eal.c:916
00:17:08.320  No locals.
00:17:08.320  #22 0x00007fe6b9017947 in start_thread () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  #23 0x00007fe6b909d860 in clone3 () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  
00:17:08.320  Thread 5 (Thread 0x7fe6b7eab6c0 (LWP 3950026)):
00:17:08.320  #0  0x00007fe6b909dc62 in epoll_wait () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  #1  0x00007fe6b9874325 in eal_intr_handle_interrupts (pfd=7, totalfds=1) at ../lib/eal/linux/eal_interrupts.c:1077
00:17:08.320          events = {{events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}}
00:17:08.320          nfds = 0
00:17:08.320  #2  0x00007fe6b987455a in eal_intr_thread_main (arg=0x0) at ../lib/eal/linux/eal_interrupts.c:1163
00:17:08.320          pipe_event = {events = 3, data = {ptr = 0x5, fd = 5, u32 = 5, u64 = 5}}
00:17:08.320          src = 0x0
00:17:08.320          numfds = 1
00:17:08.320          pfd = 7
00:17:08.320          __func__ = "eal_intr_thread_main"
00:17:08.320  #3  0x00007fe6b984d07f in ctrl_thread_start (arg=0x18e1ac0) at ../lib/eal/common/eal_common_thread.c:285
00:17:08.320          params = 0x18e1ac0
00:17:08.320          start_arg = 0x0
00:17:08.320          start_routine = 0x7fe6b987438a <eal_intr_thread_main>
00:17:08.320  #4  0x00007fe6b9017947 in start_thread () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  #5  0x00007fe6b909d860 in clone3 () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  
00:17:08.320  Thread 4 (Thread 0x7fe6b76aa6c0 (LWP 3950027)):
00:17:08.320  #0  0x00007fe6b909f77b in recvmsg () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  #1  0x00007fe6b986245c in read_msg (fd=10, m=0x7fe6b76a8a40, s=0x7fe6b76a89d0) at ../lib/eal/common/eal_common_proc.c:284
00:17:08.320          msglen = 0
00:17:08.320          iov = {iov_base = 0x7fe6b76a8a40, iov_len = 332}
00:17:08.320          msgh = {msg_name = 0x7fe6b76a89d0, msg_namelen = 110, msg_iov = 0x7fe6b76a8990, msg_iovlen = 1, msg_control = 0x7fe6b76a8920, msg_controllen = 48, msg_flags = 0}
00:17:08.320          control = '\000' <repeats 47 times>
00:17:08.320          cmsg = 0x0
00:17:08.320          buflen = 332
00:17:08.320  #2  0x00007fe6b9862940 in mp_handle (arg=0x0) at ../lib/eal/common/eal_common_proc.c:410
00:17:08.320          ret = -1182103872
00:17:08.320          msg = {type = 0, msg = {name = '\000' <repeats 63 times>, len_param = 0, num_fds = 0, param = '\000' <repeats 80 times>, "\001\000\000\000 \213j\267\346\177\000\000\345\303\204\271\346\177\000\000\000\000\000\000\000\000\000\000\b\273\212\271\346\177", '\000' <repeats 14 times>, "\001\000\000\000\f\000\000\000\000\000\000\000\377\377\377\377\f\000\000\000`\213j\267\346\177\000\000i\304\204\271\346\177\000\000`\213j\267\346\177\000\000\200V\206\271\346\177\000\000`\213j\267\346\177\000\000\r\303\204\271\377\377\377\377\240\020\212\271"..., fds = {32742, 0, 0, 26090176, 0, 0, 0, 26090176}}}
00:17:08.320          sa = {sun_family = 0, sun_path = '\000' <repeats 107 times>}
00:17:08.320          fd = 10
00:17:08.320  #3  0x00007fe6b984d07f in ctrl_thread_start (arg=0x18e1ac0) at ../lib/eal/common/eal_common_thread.c:285
00:17:08.320          params = 0x18e1ac0
00:17:08.320          start_arg = 0x0
00:17:08.320          start_routine = 0x7fe6b9862911 <mp_handle>
00:17:08.320  #4  0x00007fe6b9017947 in start_thread () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  #5  0x00007fe6b909d860 in clone3 () from /usr/lib64/libc.so.6
00:17:08.320  No symbol table info available.
00:17:08.320  
00:17:08.320  Thread 3 (Thread 0x7fe6b6ea96c0 (LWP 3950028)):
00:17:08.320  #0  0x00007fe6ba5653bc in nvmf_rdma_poll_group_poll (group=0x7fe6b000ee70) at rdma.c:4836
00:17:08.320          rtransport = 0x19643a0
00:17:08.320          rgroup = 0x7fe6b000ee70
00:17:08.320          rpoller = 0x7fe6b000eee0
00:17:08.320          tmp = 0x0
00:17:08.320          count = 0
00:17:08.320          rc = 0
00:17:08.320  #1  0x00007fe6ba4c3bef in nvmf_transport_poll_group_poll (group=0x7fe6b000ee70) at transport.c:724
00:17:08.320  No locals.
00:17:08.320  #2  0x00007fe6ba47cbdf in nvmf_poll_group_poll (ctx=0x7fe6b0000c00) at nvmf.c:160
00:17:08.320          group = 0x7fe6b0000c00
00:17:08.320          rc = 0
00:17:08.320          count = 0
00:17:08.320          tgroup = 0x7fe6b000ee70
00:17:08.320  #3  0x00007fe6b9b41f94 in thread_execute_poller (thread=0x194bef0, poller=0x7fe6b0000cc0) at thread.c:953
00:17:08.320          rc = 0
00:17:08.320          __PRETTY_FUNCTION__ = "thread_execute_poller"
00:17:08.320          __func__ = "thread_execute_poller"
00:17:08.321  #4  0x00007fe6b9b43dbc in thread_poll (thread=0x194bef0, max_msgs=0, now=1091428454151345) at thread.c:1079
00:17:08.321          poller_rc = 0
00:17:08.321          msg_count = 0
00:17:08.321          poller = 0x7fe6b0000cc0
00:17:08.321          tmp = 0x0
00:17:08.321          critical_msg = 0x0
00:17:08.321          rc = 0
00:17:08.321  #5  0x00007fe6b9b44a0d in spdk_thread_poll (thread=0x194bef0, max_msgs=0, now=1091428454151345) at thread.c:1163
00:17:08.321          orig_thread = 0x0
00:17:08.321          rc = 0
00:17:08.321  #6  0x00007fe6b9f7d2d7 in _reactor_run (reactor=0x1949000) at reactor.c:914
00:17:08.321          thread = 0x194bef0
00:17:08.321          lw_thread = 0x194c238
00:17:08.321          tmp = 0x0
00:17:08.321          now = 1091428454151345
00:17:08.321          rc = 0
00:17:08.321  #7  0x00007fe6b9f7d94b in reactor_run (arg=0x1949000) at reactor.c:952
00:17:08.321          reactor = 0x1949000
00:17:08.321          thread = 0x7fe6b908c1a1 <write+97>
00:17:08.321          lw_thread = 0x0
00:17:08.321          tmp = 0x0
00:17:08.321          thread_name = "reactor_2\000\000\000\000\000\000\000\203y\352\266\346\177\000\000\001\000\000\000\000\000\000"
00:17:08.321          last_sched = 0
00:17:08.321          __func__ = "reactor_run"
00:17:08.321  #8  0x00007fe6b984ccc9 in eal_thread_loop (arg=0x2) at ../lib/eal/common/eal_common_thread.c:212
00:17:08.321          f = 0x7fe6b9f7d6cc <reactor_run>
00:17:08.321          fct_arg = 0x1949000
00:17:08.321          lcore_id = 2
00:17:08.321          cpuset = "2", '\000' <repeats 254 times>
00:17:08.321          ret = 0
00:17:08.321  #9  0x00007fe6b986d979 in eal_worker_thread_loop (arg=0x2) at ../lib/eal/linux/eal.c:916
00:17:08.321  No locals.
00:17:08.321  #10 0x00007fe6b9017947 in start_thread () from /usr/lib64/libc.so.6
00:17:08.321  No symbol table info available.
00:17:08.321  #11 0x00007fe6b909d860 in clone3 () from /usr/lib64/libc.so.6
00:17:08.321  No symbol table info available.
00:17:08.321  
00:17:08.321  Thread 2 (Thread 0x7fe6b7eafa00 (LWP 3950024)):
00:17:08.321  #0  nvmf_rdma_poller_poll (rtransport=0x19643a0, rpoller=0x1972c70) at rdma.c:4603
00:17:08.321          wc = {{wr_id = 34359738369, status = IBV_WC_LOC_LEN_ERR, opcode = IBV_WC_SEND, vendor_err = 0, byte_len = 0, {imm_data = 1, invalidated_rkey = 1}, qp_num = 0, src_qp = 65536, wc_flags = 1, pkey_index = 1, slid = 0, sl = 99 'c', dlid_path_bits = 0 '\000'}, {wr_id = 4294967304, status = 99, opcode = IBV_WC_RDMA_WRITE, vendor_err = 131072, byte_len = 100, {imm_data = 1, invalidated_rkey = 1}, qp_num = 3799, src_qp = 3799, wc_flags = 0, pkey_index = 181, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184372089014, status = 65536, opcode = IBV_WC_RECV, vendor_err = 118488704, byte_len = 8192, {imm_data = 857723200, invalidated_rkey = 857723200}, qp_num = 32767, src_qp = 0, wc_flags = 0, pkey_index = 8, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 140734051111904, status = 426770048, opcode = 8192, vendor_err = 118488704, byte_len = 8192, {imm_data = 857723200, invalidated_rkey = 857723200}, qp_num = 32767, src_qp = 8, wc_flags = 8, pkey_index = 8, slid = 0, sl = 0 '\000', dlid_path_bits = 32 ' '}, {wr_id = 140734051111232, status = IBV_WC_SUCCESS, opcode = IBV_WC_SEND, vendor_err = 118488704, byte_len = 8192, {imm_data = 857722200, invalidated_rkey = 857722200}, qp_num = 32767, src_qp = 857722208, wc_flags = 32767, pkey_index = 52584, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 35184490577536, status = 100, opcode = 32767, vendor_err = 101, byte_len = 8192, {imm_data = 65536, invalidated_rkey = 65536}, qp_num = 32767, src_qp = 330279424, wc_flags = 8192, pkey_index = 53728, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 0, status = IBV_WC_LOC_ACCESS_ERR, opcode = IBV_WC_SEND, vendor_err = 857724064, byte_len = 32767, {imm_data = 66056704, invalidated_rkey = 66056704}, qp_num = 8192, src_qp = 330279424, wc_flags = 8192, pkey_index = 53728, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 34359738376, status = IBV_WC_LOC_ACCESS_ERR, opcode = 8, vendor_err = 8, byte_len = 0, {imm_data = 1, invalidated_rkey = 1}, qp_num = 1, src_qp = 1, wc_flags = 0, pkey_index = 1, slid = 0, sl = 8 '\b', dlid_path_bits = 0 '\000'}, {wr_id = 390842023937, status = IBV_WC_LOC_ACCESS_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 91, byte_len = 1, {imm_data = 131072, invalidated_rkey = 131072}, qp_num = 92, src_qp = 1, wc_flags = 8192, pkey_index = 53728, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 35184702368640, status = 857723360, opcode = 32767, vendor_err = 118488704, byte_len = 8192, {imm_data = 857722440, invalidated_rkey = 857722440}, qp_num = 32767, src_qp = 857722448, wc_flags = 32767, pkey_index = 52824, slid = 13087, sl = 91 '[', dlid_path_bits = 0 '\000'}, {wr_id = 4294967388, status = IBV_WC_SUCCESS, opcode = 32767, vendor_err = 118488704, byte_len = 8192, {imm_data = 8, invalidated_rkey = 8}, qp_num = 8, src_qp = 8, wc_flags = 0, pkey_index = 1, slid = 0, sl = 1 '\001', dlid_path_bits = 0 '\000'}, {wr_id = 1, status = IBV_WC_LOC_LEN_ERR, opcode = 8, vendor_err = 426770048, byte_len = 8192, {imm_data = 857723488, invalidated_rkey = 857723488}, qp_num = 32767, src_qp = 426770432, wc_flags = 8192, pkey_index = 53856, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 35184798317800, status = 99, opcode = IBV_WC_RECV, vendor_err = 100, byte_len = 8192, {imm_data = 65536, invalidated_rkey = 65536}, qp_num = 8192, src_qp = 330279424, wc_flags = 8192, pkey_index = 54064, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 0, status = 330279424, opcode = 8192, vendor_err = 857723696, byte_len = 32767, {imm_data = 0, invalidated_rkey = 0}, qp_num = 0, src_qp = 330279424, wc_flags = 8192, pkey_index = 54064, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 0, status = 330279424, opcode = 8192, vendor_err = 857723696, byte_len = 32767, {imm_data = 0, invalidated_rkey = 0}, qp_num = 0, src_qp = 330279424, wc_flags = 8192, pkey_index = 53064, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 140734051110736, status = 4248097783, opcode = 254117, vendor_err = 857722864, byte_len = 32767, {imm_data = 3128204519, invalidated_rkey = 3128204519}, qp_num = 32742, src_qp = 33840, wc_flags = 0, pkey_index = 92, slid = 0, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 140733193388124, status = IBV_WC_SUCCESS, opcode = 32742, vendor_err = 426770048, byte_len = 8192, {imm_data = 857723872, invalidated_rkey = 857723872}, qp_num = 32767, src_qp = 0, wc_flags = 0, pkey_index = 65152, slid = 6511, sl = 0 '\000', dlid_path_bits = 32 ' '}, {wr_id = 140734051111904, status = IBV_WC_SUCCESS, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 1, {imm_data = 26518512, invalidated_rkey = 26518512}, qp_num = 0, src_qp = 857723008, wc_flags = 32767, pkey_index = 54195, slid = 47537, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 140734051111072, status = 3115925480, opcode = 32742, vendor_err = 857722976, byte_len = 0, {imm_data = 26524544, invalidated_rkey = 26524544}, qp_num = 0, src_qp = 857723392, wc_flags = 32767, pkey_index = 50574, slid = 47540, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 140628852248676, status = IBV_WC_LOC_QP_OP_ERR, opcode = IBV_WC_SEND, vendor_err = 26524096, byte_len = 0, {imm_data = 0, invalidated_rkey = 0}, qp_num = 0, src_qp = 66056704, wc_flags = 8192, pkey_index = 54432, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 0, status = 66056704, opcode = 8192, vendor_err = 857724064, byte_len = 32767, {imm_data = 0, invalidated_rkey = 0}, qp_num = 0, src_qp = 857723120, wc_flags = 32767, pkey_index = 45056, slid = 46278, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 140734051111520, status = 3115075602, opcode = 32742, vendor_err = 26956320, byte_len = 0, {imm_data = 1, invalidated_rkey = 1}, qp_num = 0, src_qp = 426770048, wc_flags = 8192, pkey_index = 54240, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 0, status = 426770048, opcode = 8192, vendor_err = 857723872, byte_len = 32767, {imm_data = 0, invalidated_rkey = 0}, qp_num = 0, src_qp = 426770048, wc_flags = 8192, pkey_index = 53240, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}, {wr_id = 140734051110912, status = 857722888, opcode = 32767, vendor_err = 3033043014, byte_len = 32742, {imm_data = 3033159496, invalidated_rkey = 3033159496}, qp_num = 32742, src_qp = 3033043044, wc_flags = 32742, pkey_index = 48, slid = 0, sl = 48 '0', dlid_path_bits = 0 '\000'}, {wr_id = 140734051111544, status = 857723280, opcode = 32767, vendor_err = 0, byte_len = 0, {imm_data = 2, invalidated_rkey = 2}, qp_num = 0, src_qp = 857669632, wc_flags = 195, pkey_index = 47581, slid = 64821, sl = 165 '\245', dlid_path_bits = 224 '\340'}, {wr_id = 35184438145536, status = 857724064, opcode = 32767, vendor_err = 0, byte_len = 0, {imm_data = 66056704, invalidated_rkey = 66056704}, qp_num = 8192, src_qp = 857724064, wc_flags = 32767, pkey_index = 0, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184438145536, status = 857723032, opcode = 32767, vendor_err = 857723040, byte_len = 32767, {imm_data = 857723048, invalidated_rkey = 857723048}, qp_num = 32767, src_qp = 3032958320, wc_flags = 32742, pkey_index = 48, slid = 0, sl = 48 '0', dlid_path_bits = 0 '\000'}, {wr_id = 140734051111696, status = 857723424, opcode = 32767, vendor_err = 3125101728, byte_len = 32742, {imm_data = 3467379712, invalidated_rkey = 3467379712}, qp_num = 1632321556, src_qp = 28934560, wc_flags = 0, pkey_index = 60776, slid = 65535, sl = 255 '\377', dlid_path_bits = 255 '\377'}, {wr_id = 11, status = IBV_WC_MW_BIND_ERR, opcode = IBV_WC_SEND, vendor_err = 28959728, byte_len = 0, {imm_data = 3115561119, invalidated_rkey = 3115561119}, qp_num = 32742, src_qp = 857723504, wc_flags = 32767, pkey_index = 20704, slid = 47540, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 140734051111616, status = 3115647304, opcode = 32742, vendor_err = 0, byte_len = 0, {imm_data = 26956224, invalidated_rkey = 26956224}, qp_num = 0, src_qp = 28959728, wc_flags = 0, pkey_index = 51616, slid = 404, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 28917152, status = 28959728, opcode = IBV_WC_SEND, vendor_err = 28936960, byte_len = 0, {imm_data = 3137933312, invalidated_rkey = 3137933312}, qp_num = 32742, src_qp = 857723648, wc_flags = 32767, pkey_index = 1024, slid = 52908, sl = 20 '\024', dlid_path_bits = 64 '@'}, {wr_id = 140734051111680, status = 4294962536, opcode = 4294967295, vendor_err = 11, byte_len = 0, {imm_data = 28936848, invalidated_rkey = 28936848}, qp_num = 0, src_qp = 3137933312, wc_flags = 32742, pkey_index = 26536, slid = 64, sl = 0 '\000', dlid_path_bits = 0 '\000'}}
00:17:08.321          rdma_wr = 0x20001967b7a0
00:17:08.321          rdma_req = 0x614b4014ceac0400
00:17:08.321          rdma_recv = 0x800000001
00:17:08.321          rqpair = 0x7fff331fd330
00:17:08.321          tmp_rqpair = 0x800000001
00:17:08.321          reaped = 0
00:17:08.321          i = 0
00:17:08.321          count = 0
00:17:08.321          rc = 0
00:17:08.321          error = false
00:17:08.321          poll_tsc = 1091428454142936
00:17:08.321          __func__ = "nvmf_rdma_poller_poll"
00:17:08.321          __PRETTY_FUNCTION__ = "nvmf_rdma_poller_poll"
00:17:08.321  #1  0x00007fe6ba565441 in nvmf_rdma_poll_group_poll (group=0x1972c00) at rdma.c:4837
00:17:08.321          rtransport = 0x19643a0
00:17:08.321          rgroup = 0x1972c00
00:17:08.321          rpoller = 0x1972c70
00:17:08.321          tmp = 0x1972fb0
00:17:08.321          count = 0
00:17:08.321          rc = 0
00:17:08.321  #2  0x00007fe6ba4c3bef in nvmf_transport_poll_group_poll (group=0x1972c00) at transport.c:724
00:17:08.321  No locals.
00:17:08.321  #3  0x00007fe6ba47cbdf in nvmf_poll_group_poll (ctx=0x194cd90) at nvmf.c:160
00:17:08.321          group = 0x194cd90
00:17:08.321          rc = 0
00:17:08.321          count = 0
00:17:08.321          tgroup = 0x1972c00
00:17:08.321  #4  0x00007fe6b9b41f94 in thread_execute_poller (thread=0x194c9a0, poller=0x194ce50) at thread.c:953
00:17:08.321          rc = 0
00:17:08.321          __PRETTY_FUNCTION__ = "thread_execute_poller"
00:17:08.321          __func__ = "thread_execute_poller"
00:17:08.321  #5  0x00007fe6b9b43dbc in thread_poll (thread=0x194c9a0, max_msgs=0, now=1091428454150001) at thread.c:1079
00:17:08.321          poller_rc = 0
00:17:08.321          msg_count = 0
00:17:08.321          poller = 0x194ce50
00:17:08.321          tmp = 0x0
00:17:08.321          critical_msg = 0x0
00:17:08.321          rc = 0
00:17:08.321  #6  0x00007fe6b9b44a0d in spdk_thread_poll (thread=0x194c9a0, max_msgs=0, now=1091428454150001) at thread.c:1163
00:17:08.321          orig_thread = 0x0
00:17:08.321          rc = 0
00:17:08.321  #7  0x00007fe6b9f7d2d7 in _reactor_run (reactor=0x1948d40) at reactor.c:914
00:17:08.321          thread = 0x194c9a0
00:17:08.321          lw_thread = 0x194cce8
00:17:08.321          tmp = 0x0
00:17:08.321          now = 1091428454150001
00:17:08.321          rc = 0
00:17:08.321  #8  0x00007fe6b9f7d94b in reactor_run (arg=0x1948d40) at reactor.c:952
00:17:08.321          reactor = 0x1948d40
00:17:08.321          thread = 0x7fe6b98309ef <rte_get_next_lcore+43>
00:17:08.321          lw_thread = 0x7fb9b4558a
00:17:08.321          tmp = 0x7fff331fd5b0
00:17:08.322          thread_name = "reactor_1\000\000\000\200\000\000\000\340\325\0373\377\177\000\000\311\225z\272\001\000\000"
00:17:08.322          last_sched = 0
00:17:08.322          __func__ = "reactor_run"
00:17:08.322  #9  0x00007fe6b9f7e4a8 in spdk_reactors_start () at reactor.c:1068
00:17:08.322          reactor = 0x1948d40
00:17:08.322          i = 4294967295
00:17:08.322          current_core = 1
00:17:08.322          rc = 0
00:17:08.322          __func__ = "spdk_reactors_start"
00:17:08.322          __PRETTY_FUNCTION__ = "spdk_reactors_start"
00:17:08.322  #10 0x00007fe6b9f725c7 in spdk_app_start (opts_user=0x7fff331fd930, start_fn=0x402371 <nvmf_tgt_started>, arg1=0x0) at app.c:839
00:17:08.322          rc = 0
00:17:08.322          tty = 0x0
00:17:08.322          tmp_cpumask = {str = '\000' <repeats 256 times>, cpus = "\002", '\000' <repeats 126 times>}
00:17:08.322          g_env_was_setup = false
00:17:08.322          opts_local = {name = 0x40501d "nvmf", json_config_file = 0x0, json_config_ignore_errors = false, reserved17 = "\000\000\000\000\000\000", rpc_addr = 0x7fe6b9f8a224 "/var/tmp/spdk.sock", reactor_mask = 0x7fff33200077 "0x1E", tpoint_group_mask = 0x7fff3320006d "0xFFFF", shm_id = 0, reserved52 = "\000\000\000", shutdown_cb = 0x0, enable_coredump = true, reserved65 = "\000\000", mem_channel = -1, main_core = -1, mem_size = -1, no_pci = false, hugepage_single_segments = false, unlink_hugepage = false, no_huge = false, reserved84 = "\000\000\000", hugedir = 0x0, print_level = SPDK_LOG_INFO, reserved100 = "\000\000\000", num_pci_addr = 0, pci_blocked = 0x0, pci_allowed = 0x0, iova_mode = 0x0, delay_subsystem_init = false, reserved137 = "\000\000\000\000\000\000", num_entries = 32768, env_context = 0x0, log = 0x0, base_virtaddr = 35184372088832, opts_size = 236, disable_signal_handlers = false, interrupt_mode = false, reserved186 = "\000\000\000\000\000", msg_mempool_size = 262143, rpc_allowlist = 0x0, vf_token = 0x0, lcore_map = 0x0, rpc_log_level = SPDK_LOG_DISABLED, rpc_log_file = 0x0}
00:17:08.322          opts = 0x7fff331fd670
00:17:08.322          i = 128
00:17:08.322          __func__ = "spdk_app_start"
00:17:08.322  #11 0x00000000004024e0 in main (argc=7, argv=0x7fff331fdb48) at nvmf_main.c:47
00:17:08.322          rc = 1
00:17:08.322          opts = {name = 0x40501d "nvmf", json_config_file = 0x0, json_config_ignore_errors = false, reserved17 = "\000\000\000\000\000\000", rpc_addr = 0x7fe6b9f8a224 "/var/tmp/spdk.sock", reactor_mask = 0x7fff33200077 "0x1E", tpoint_group_mask = 0x7fff3320006d "0xFFFF", shm_id = 0, reserved52 = "\000\000\000", shutdown_cb = 0x0, enable_coredump = true, reserved65 = "\000\000", mem_channel = -1, main_core = -1, mem_size = -1, no_pci = false, hugepage_single_segments = false, unlink_hugepage = false, no_huge = false, reserved84 = "\000\000\000", hugedir = 0x0, print_level = SPDK_LOG_INFO, reserved100 = "\000\000\000", num_pci_addr = 0, pci_blocked = 0x0, pci_allowed = 0x0, iova_mode = 0x0, delay_subsystem_init = false, reserved137 = "\000\000\000\000\000\000", num_entries = 32768, env_context = 0x0, log = 0x0, base_virtaddr = 35184372088832, opts_size = 236, disable_signal_handlers = false, interrupt_mode = false, reserved186 = "\000\000\000\000\000", msg_mempool_size = 0, rpc_allowlist = 0x0, vf_token = 0x0, lcore_map = 0x0, rpc_log_level = SPDK_LOG_DISABLED, rpc_log_file = 0x0}
00:17:08.322  
00:17:08.322  Thread 1 (Thread 0x7fe6b5ea76c0 (LWP 3950030)):
00:17:08.322  #0  0x00007fe6b9019884 in __pthread_kill_implementation () from /usr/lib64/libc.so.6
00:17:08.322  No symbol table info available.
00:17:08.322  #1  0x00007fe6b8fc8afe in raise () from /usr/lib64/libc.so.6
00:17:08.322  No symbol table info available.
00:17:08.322  #2  0x00007fe6b8fb187f in abort () from /usr/lib64/libc.so.6
00:17:08.322  No symbol table info available.
00:17:08.322  #3  0x00007fe6b8fb179b in __assert_fail_base.cold () from /usr/lib64/libc.so.6
00:17:08.322  No symbol table info available.
00:17:08.322  #4  0x00007fe6b8fc1187 in __assert_fail () from /usr/lib64/libc.so.6
00:17:08.322  No symbol table info available.
00:17:08.322  #5  0x00007fe6ba563b41 in nvmf_rdma_poller_poll (rtransport=0x19643a0, rpoller=0x7fe6ac00eee0) at rdma.c:4701
00:17:08.322          wc = {{wr_id = 35184795728088, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RECV, vendor_err = 65537, byte_len = 0, {imm_data = 330279424, invalidated_rkey = 330279424}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 0, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184795728192, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RECV, vendor_err = 65537, byte_len = 0, {imm_data = 3052032120, invalidated_rkey = 3052032120}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 21584, slid = 46570, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184795728296, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RECV, vendor_err = 65537, byte_len = 0, {imm_data = 3052033104, invalidated_rkey = 3052033104}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 8, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184791914289, status = IBV_WC_SUCCESS, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 3052032672, invalidated_rkey = 3052032672}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 8, slid = 0, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791914290, status = IBV_WC_SUCCESS, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3096185946, invalidated_rkey = 3096185946}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 32, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 0, status = IBV_WC_GENERAL_ERR, opcode = IBV_WC_RECV, vendor_err = 65537, byte_len = 0, {imm_data = 2888296448, invalidated_rkey = 2888296448}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 21408, slid = 46570, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791912897, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 5241984, invalidated_rkey = 5241984}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 90, slid = 0, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791912898, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 1, invalidated_rkey = 1}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 1, slid = 0, sl = 8 '\b', dlid_path_bits = 0 '\000'}, {wr_id = 35184791911505, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 131072, invalidated_rkey = 131072}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 65152, slid = 6511, sl = 0 '\000', dlid_path_bits = 32 ' '}, {wr_id = 35184791911506, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3052032504, invalidated_rkey = 3052032504}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 21000, slid = 46570, sl = 91 '[', dlid_path_bits = 0 '\000'}, {wr_id = 35184791910113, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 8, invalidated_rkey = 8}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 1, slid = 0, sl = 1 '\001', dlid_path_bits = 0 '\000'}, {wr_id = 35184791910114, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3052033552, invalidated_rkey = 3052033552}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 22032, slid = 46570, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791896193, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 426770180, invalidated_rkey = 426770180}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 6, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184791896194, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 120585860, invalidated_rkey = 120585860}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 59392, slid = 44071, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791897585, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 3096232567, invalidated_rkey = 3096232567}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 0, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184791897586, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3096042868, invalidated_rkey = 3096042868}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 92, slid = 0, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791898977, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 3052033936, invalidated_rkey = 3052033936}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 65024, slid = 1839, sl = 0 '\000', dlid_path_bits = 32 ' '}, {wr_id = 35184791898978, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 1, invalidated_rkey = 1}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 62144, slid = 1038, sl = 0 '\000', dlid_path_bits = 32 ' '}, {wr_id = 35184791900369, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_WRITE, vendor_err = 65537, byte_len = 65536, {imm_data = 2888130096, invalidated_rkey = 2888130096}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 48880, slid = 404, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184791900370, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 0, invalidated_rkey = 0}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 22608, slid = 46570, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791901762, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 0, invalidated_rkey = 0}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 45056, slid = 46278, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791903154, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 1, invalidated_rkey = 1}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 22416, slid = 46570, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791904546, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 0, invalidated_rkey = 0}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 21416, slid = 46570, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791943521, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_RDMA_READ, vendor_err = 65537, byte_len = 65536, {imm_data = 3036407856, invalidated_rkey = 3036407856}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 48, slid = 0, sl = 48 '0', dlid_path_bits = 0 '\000'}, {wr_id = 35184791905938, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 2, invalidated_rkey = 2}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 14431, slid = 64827, sl = 165 '\245', dlid_path_bits = 224 '\340'}, {wr_id = 35184791907330, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 5241984, invalidated_rkey = 5241984}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 0, slid = 0, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184791908722, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3052033112, invalidated_rkey = 3052033112}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 48, slid = 0, sl = 48 '0', dlid_path_bits = 0 '\000'}, {wr_id = 35184791868354, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3467379712, invalidated_rkey = 3467379712}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 60776, slid = 65535, sl = 255 '\377', dlid_path_bits = 255 '\377'}, {wr_id = 35184791869746, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 3115561119, invalidated_rkey = 3115561119}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 20704, slid = 47540, sl = 230 '\346', dlid_path_bits = 127 '\177'}, {wr_id = 35184791871138, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 2886015632, invalidated_rkey = 2886015632}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 50704, slid = 404, sl = 0 '\000', dlid_path_bits = 0 '\000'}, {wr_id = 35184791872530, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 0, invalidated_rkey = 0}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 1024, slid = 52908, sl = 20 '\024', dlid_path_bits = 64 '@'}, {wr_id = 35184791873922, status = IBV_WC_WR_FLUSH_ERR, opcode = IBV_WC_SEND, vendor_err = 65537, byte_len = 16, {imm_data = 2886290608, invalidated_rkey = 2886290608}, qp_num = 3838, src_qp = 3838, wc_flags = 0, pkey_index = 53536, slid = 13087, sl = 255 '\377', dlid_path_bits = 127 '\177'}}
00:17:08.322          rdma_wr = 0x200019060731
00:17:08.322          rdma_req = 0x200019060410
00:17:08.322          rdma_recv = 0x200019403958
00:17:08.322          rqpair = 0x1d22240
00:17:08.322          tmp_rqpair = 0x0
00:17:08.322          reaped = 32
00:17:08.322          i = 3
00:17:08.322          count = 0
00:17:08.322          rc = 91
00:17:08.322          error = true
00:17:08.322          poll_tsc = 1091428453740192
00:17:08.322          __func__ = "nvmf_rdma_poller_poll"
00:17:08.322          __PRETTY_FUNCTION__ = "nvmf_rdma_poller_poll"
00:17:08.322  #6  0x00007fe6ba565441 in nvmf_rdma_poll_group_poll (group=0x7fe6ac00ee70) at rdma.c:4837
00:17:08.322          rtransport = 0x19643a0
00:17:08.322          rgroup = 0x7fe6ac00ee70
00:17:08.322          rpoller = 0x7fe6ac00eee0
00:17:08.322          tmp = 0x7fe6ac051260
00:17:08.322          count = 0
00:17:08.322          rc = -1
00:17:08.323  #7  0x00007fe6ba4c3bef in nvmf_transport_poll_group_poll (group=0x7fe6ac00ee70) at transport.c:724
00:17:08.323  No locals.
00:17:08.323  #8  0x00007fe6ba47cbdf in nvmf_poll_group_poll (ctx=0x7fe6ac000c00) at nvmf.c:160
00:17:08.323          group = 0x7fe6ac000c00
00:17:08.323          rc = -1
00:17:08.323          count = 0
00:17:08.323          tgroup = 0x7fe6ac00ee70
00:17:08.323  #9  0x00007fe6b9b41f94 in thread_execute_poller (thread=0x194c610, poller=0x7fe6ac000cc0) at thread.c:953
00:17:08.323          rc = 0
00:17:08.323          __PRETTY_FUNCTION__ = "thread_execute_poller"
00:17:08.323          __func__ = "thread_execute_poller"
00:17:08.323  #10 0x00007fe6b9b43dbc in thread_poll (thread=0x194c610, max_msgs=0, now=1091428453725840) at thread.c:1079
00:17:08.323          poller_rc = 0
00:17:08.323          msg_count = 0
00:17:08.323          poller = 0x7fe6ac000cc0
00:17:08.323          tmp = 0x0
00:17:08.323          critical_msg = 0x0
00:17:08.323          rc = 0
00:17:08.323  #11 0x00007fe6b9b44a0d in spdk_thread_poll (thread=0x194c610, max_msgs=0, now=1091428453725840) at thread.c:1163
00:17:08.323          orig_thread = 0x0
00:17:08.323          rc = 0
00:17:08.323  #12 0x00007fe6b9f7d2d7 in _reactor_run (reactor=0x1949580) at reactor.c:914
00:17:08.323          thread = 0x194c610
00:17:08.323          lw_thread = 0x194c958
00:17:08.323          tmp = 0x0
00:17:08.323          now = 1091428453725840
00:17:08.323          rc = 1
00:17:08.323  #13 0x00007fe6b9f7d94b in reactor_run (arg=0x1949580) at reactor.c:952
00:17:08.323          reactor = 0x1949580
00:17:08.323          thread = 0x7fe6b908c1a1 <write+97>
00:17:08.323          lw_thread = 0x0
00:17:08.323          tmp = 0x0
00:17:08.323          thread_name = "reactor_4\000\000\000\000\000\000\000\203Y\352\265\346\177\000\000\001\000\000\000\000\000\000"
00:17:08.323          last_sched = 0
00:17:08.323          __func__ = "reactor_run"
00:17:08.323  #14 0x00007fe6b984ccc9 in eal_thread_loop (arg=0x4) at ../lib/eal/common/eal_common_thread.c:212
00:17:08.323          f = 0x7fe6b9f7d6cc <reactor_run>
00:17:08.323          fct_arg = 0x1949580
00:17:08.323          lcore_id = 4
00:17:08.323          cpuset = "4", '\000' <repeats 254 times>
00:17:08.323          ret = 0
00:17:08.323  #15 0x00007fe6b986d979 in eal_worker_thread_loop (arg=0x4) at ../lib/eal/linux/eal.c:916
00:17:08.323  No locals.
00:17:08.323  #16 0x00007fe6b9017947 in start_thread () from /usr/lib64/libc.so.6
00:17:08.323  No symbol table info available.
00:17:08.323  #17 0x00007fe6b909d860 in clone3 () from /usr/lib64/libc.so.6
00:17:08.323  No symbol table info available.

Potentially related (or the same, though trace feels a bit more verbose) to https://github.com/spdk/spdk/issues/3212.

tomzawadzki commented 1 month ago

There might be another occurrence on nightly https://ci.spdk.io/results/autotest-nightly/builds/3888/archive/nvmf-cvl-phy-autotest_3450/build.log