openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.15k stars 426 forks source link

posix/test_uct_mm.alloc/0 fails on PPC #4616

Open dmitrygx opened 4 years ago

dmitrygx commented 4 years ago
17:19:03 [----------] 1 test from posix/test_uct_mm
17:19:03 [ RUN      ] posix/test_uct_mm.alloc/0 <posix/memory,dir=.>
17:19:03 [     INFO ] deadbeef11111
17:19:03 [     INFO ] deadbeef11111
17:19:03 [     INFO ] deadbeef22222
17:19:03 [r-vmb-ppc-jenkins:1094 :0:1094] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
17:19:03 
17:19:03 /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../src/uct/sm/mm/base/mm_iface.c: [ uct_mm_iface_t_cleanup() ]
17:19:03       ...
17:19:03       626 static UCS_CLASS_CLEANUP_FUNC(uct_mm_iface_t)
17:19:03       627 {
17:19:03       628     uct_base_iface_progress_disable(&self->super.super.super,
17:19:03 ==>   629                                     UCT_PROGRESS_SEND | UCT_PROGRESS_RECV);
17:19:03       630 
17:19:03       631     /* return all the descriptors that are now 'assigned' to the FIFO,
17:19:03       632      * to their mpool */
17:19:03 
17:19:03 ==== backtrace (tid:   1094) ====
17:19:03  0 0x0000000000058998 ucs_debug_print_backtrace()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../src/ucs/debug/debug.c:625
17:19:03  1 0x0000000000017928 uct_mm_iface_t_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../src/uct/sm/mm/base/mm_iface.c:629
17:19:03  2 0x00000000000711c8 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../src/ucs/type/class.c:52
17:19:03  3 0x0000000000017428 uct_mm_iface_t_delete()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../src/uct/sm/mm/base/mm_iface.c:647
17:19:03  4 0x00000000000132ec uct_iface_close()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../src/uct/base/uct_iface.c:196
17:19:03  5 0x0000000010304660 ucs::handle<uct_iface*, void*>::release()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/test_helpers.h:639
17:19:03  6 0x00000000101cb80c ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/test.cc:251
17:19:03  7 0x00000000101e9d14 uct_test::TearDown()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/uct/uct_test.h:96
17:19:03  8 0x00000000101aa53c testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest-all.cc:3562
17:19:03  9 0x0000000010199664 testing::Test::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest-all.cc:3643
17:19:03 10 0x000000001019982c testing::TestInfo::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest-all.cc:3812
17:19:03 11 0x0000000010199a78 testing::TestCase::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest-all.cc:3930
17:19:03 12 0x000000001019fa78 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest-all.cc:5808
17:19:03 13 0x000000001019fec4 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest-all.cc:5725
17:19:03 14 0x0000000010124b18 main()  /scrap/jenkins/workspace/hpc-ucx-pr-3/label/r-vmb-ppc-jenkins/worker/3/contrib/../test/gtest/common/gtest.h:20059
17:19:03 15 0x0000000000024980 generic_start_main.isra.0()  libc-start.c:0
17:19:03 16 0x0000000000024b74 __libc_start_main()  ???:0

http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/label=r-vmb-ppc-jenkins,worker=3/13887/console

Perhaps, this is a configuration issue

yosefe commented 4 years ago

looks like the first descriptor in mpool was overrun by 0's. so far could not reproduce it by standalone test.

hoopoepg commented 4 years ago

failed to reproduce: 10 iterations by gtest_repeat=1000, about 6 hours run gtest_filter=posix/test_uct*

evgeny-leksikov commented 4 years ago

happened again on worker_1@hpc-test-node-ppc

[2020-04-03T09:31:04.624Z] [----------] 1 test from posix/test_uct_mm

[2020-04-03T09:31:04.624Z] [ RUN      ] posix/test_uct_mm.alloc/0 <posix/memory,dir=.>

[2020-04-03T09:31:04.624Z] [     INFO ] deadbeef11111

[2020-04-03T09:31:04.624Z] [     INFO ] deadbeef11111

[2020-04-03T09:31:04.624Z] [     INFO ] deadbeef22222

[2020-04-03T09:31:04.879Z] [r-vmb-ppc-jenkins:30805:0:30805] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))

[2020-04-03T09:31:05.455Z] 

[2020-04-03T09:31:05.455Z] /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/sm/mm/base/mm_iface.c: [ uct_mm_iface_t_cleanup() ]

[2020-04-03T09:31:05.455Z]       ...

[2020-04-03T09:31:05.455Z]       679 static UCS_CLASS_CLEANUP_FUNC(uct_mm_iface_t)

[2020-04-03T09:31:05.455Z]       680 {

[2020-04-03T09:31:05.455Z]       681     uct_base_iface_progress_disable(&self->super.super.super,

[2020-04-03T09:31:05.455Z] ==>   682                                     UCT_PROGRESS_SEND | UCT_PROGRESS_RECV);

[2020-04-03T09:31:05.455Z]       683 

[2020-04-03T09:31:05.455Z]       684     /* return all the descriptors that are now 'assigned' to the FIFO,

[2020-04-03T09:31:05.455Z]       685      * to their mpool */

[2020-04-03T09:31:05.455Z] 

[2020-04-03T09:31:05.719Z] ==== backtrace (tid:  30805) ====

[2020-04-03T09:31:05.719Z]  0 0x00000000000595f8 ucs_debug_print_backtrace()  /scrap/jenkins/workspace/ucx-2/contrib/../src/ucs/debug/debug.c:656

[2020-04-03T09:31:05.719Z]  1 0x0000000000019788 uct_mm_iface_t_cleanup()  /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/sm/mm/base/mm_iface.c:682

[2020-04-03T09:31:05.719Z]  2 0x0000000000072648 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/ucx-2/contrib/../src/ucs/type/class.c:56

[2020-04-03T09:31:05.719Z]  3 0x0000000000019148 uct_mm_iface_t_delete()  /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/sm/mm/base/mm_iface.c:700

[2020-04-03T09:31:05.719Z]  4 0x000000000001490c uct_iface_close()  /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/base/uct_iface.c:196

[2020-04-03T09:31:05.719Z]  5 0x0000000010317240 ucs::handle<uct_iface*, void*>::release()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test_helpers.h:645

[2020-04-03T09:31:05.719Z]  6 0x00000000101dc1cc ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test.cc:261

[2020-04-03T09:31:05.719Z]  7 0x00000000101fae94 uct_test::TearDown()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/uct/uct_test.h:96

[2020-04-03T09:31:05.719Z]  8 0x00000000101ba2bc testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3562

[2020-04-03T09:31:05.719Z]  9 0x00000000101a93e4 testing::Test::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3643

[2020-04-03T09:31:05.719Z] 10 0x00000000101a95ac testing::TestInfo::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3812

[2020-04-03T09:31:05.719Z] 11 0x00000000101a97f8 testing::TestCase::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3930

[2020-04-03T09:31:05.719Z] 12 0x00000000101af7f8 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:5808

[2020-04-03T09:31:05.719Z] 13 0x00000000101afc44 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:5725

[2020-04-03T09:31:05.719Z] 14 0x00000000101345a8 main()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest.h:20059

[2020-04-03T09:31:05.719Z] 15 0x0000000000024980 generic_start_main.isra.0()  libc-start.c:0

[2020-04-03T09:31:05.719Z] 16 0x0000000000024b74 __libc_start_main()  ???:0

[2020-04-03T09:31:05.719Z] =================================
dmitrygx commented 4 years ago
[2020-05-18T20:17:54.512Z] [----------] 1 test from posix/test_uct_mm
[2020-05-18T20:17:54.512Z] [ RUN      ] posix/test_uct_mm.alloc/0 <posix/memory,dir=.>
[2020-05-18T20:17:54.512Z] [     INFO ] deadbeef11111
[2020-05-18T20:17:54.512Z] [     INFO ] deadbeef11111
[2020-05-18T20:17:54.512Z] [     INFO ] deadbeef22222
[2020-05-18T20:17:54.767Z] [r-vmb-ppc-jenkins:8680 :0:8680] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[2020-05-18T20:17:55.022Z] 
[2020-05-18T20:17:55.022Z] /scrap/jenkins/workspace/ucx-8/contrib/../src/uct/sm/mm/base/mm_iface.c: [ uct_mm_iface_t_cleanup() ]
[2020-05-18T20:17:55.022Z]       ...
[2020-05-18T20:17:55.022Z]       680 static UCS_CLASS_CLEANUP_FUNC(uct_mm_iface_t)
[2020-05-18T20:17:55.022Z]       681 {
[2020-05-18T20:17:55.022Z]       682     uct_base_iface_progress_disable(&self->super.super.super,
[2020-05-18T20:17:55.022Z] ==>   683                                     UCT_PROGRESS_SEND | UCT_PROGRESS_RECV);
[2020-05-18T20:17:55.022Z]       684 
[2020-05-18T20:17:55.022Z]       685     /* return all the descriptors that are now 'assigned' to the FIFO,
[2020-05-18T20:17:55.022Z]       686      * to their mpool */
[2020-05-18T20:17:55.022Z] 
[2020-05-18T20:17:55.277Z] ==== backtrace (tid:   8680) ====
[2020-05-18T20:17:55.277Z]  0 0x000000000005bc58 ucs_debug_print_backtrace()  /scrap/jenkins/workspace/ucx-8/contrib/../src/ucs/debug/debug.c:656
[2020-05-18T20:17:55.277Z]  1 0x0000000000019868 uct_mm_iface_t_cleanup()  /scrap/jenkins/workspace/ucx-8/contrib/../src/uct/sm/mm/base/mm_iface.c:683
[2020-05-18T20:17:55.277Z]  2 0x0000000000075688 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/ucx-8/contrib/../src/ucs/type/class.c:56
[2020-05-18T20:17:55.277Z]  3 0x0000000000019228 uct_mm_iface_t_delete()  /scrap/jenkins/workspace/ucx-8/contrib/../src/uct/sm/mm/base/mm_iface.c:701
[2020-05-18T20:17:55.277Z]  4 0x000000000001496c uct_iface_close()  /scrap/jenkins/workspace/ucx-8/contrib/../src/uct/base/uct_iface.c:196
[2020-05-18T20:17:55.277Z]  5 0x0000000010328860 ucs::handle<uct_iface*, void*>::release()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/test_helpers.h:661
[2020-05-18T20:17:55.277Z]  6 0x0000000010328860 ucs::handle<uct_iface*, void*>::reset()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/test_helpers.h:596
[2020-05-18T20:17:55.277Z]  7 0x0000000010328860 ~handle()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/test_helpers.h:591
[2020-05-18T20:17:55.277Z]  8 0x0000000010328860 ~entity()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/uct/uct_test.h:123
[2020-05-18T20:17:55.277Z]  9 0x0000000010328860 ucs::ptr_vector_base<uct_test::entity>::release()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/test_helpers.h:523
[2020-05-18T20:17:55.277Z] 10 0x0000000010328860 ucs::ptr_vector_base<uct_test::entity>::clear()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/test_helpers.h:494
[2020-05-18T20:17:55.277Z] 11 0x0000000010328860 uct_test::cleanup()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/uct/uct_test.cc:450
[2020-05-18T20:17:55.277Z] 12 0x00000000101ed3cc ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/test.cc:261
[2020-05-18T20:17:55.277Z] 13 0x000000001020c114 uct_test::TearDown()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/uct/uct_test.h:98
[2020-05-18T20:17:55.277Z] 14 0x00000000101cb43c HandleSehExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3562
[2020-05-18T20:17:55.277Z] 15 0x00000000101cb43c testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3598
[2020-05-18T20:17:55.277Z] 16 0x00000000101ba564 testing::Test::Run()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3643
[2020-05-18T20:17:55.277Z] 17 0x00000000101ba72c testing::TestInfo::Run()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3812
[2020-05-18T20:17:55.277Z] 18 0x00000000101ba978 testing::TestCase::Run()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3930
[2020-05-18T20:17:55.277Z] 19 0x00000000101c0978 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:5808
[2020-05-18T20:17:55.277Z] 20 0x00000000101c0dc4 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:5725
[2020-05-18T20:17:55.277Z] 21 0x00000000101c0dc4 HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3562
[2020-05-18T20:17:55.277Z] 22 0x00000000101c0dc4 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:3598
[2020-05-18T20:17:55.277Z] 23 0x00000000101c0dc4 testing::UnitTest::Run()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest-all.cc:5422
[2020-05-18T20:17:55.277Z] 24 0x00000000101452e8 RUN_ALL_TESTS()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/gtest.h:20059
[2020-05-18T20:17:55.277Z] 25 0x00000000101452e8 main()  /scrap/jenkins/workspace/ucx-8/contrib/../test/gtest/common/main.cc:102
[2020-05-18T20:17:55.277Z] 26 0x0000000000025100 generic_start_main.isra.0()  libc-start.c:0
[2020-05-18T20:17:55.277Z] 27 0x00000000000252f4 __libc_start_main()  ???:0
[2020-05-18T20:17:55.277Z] =================================

http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/ucx/detail/ucx/4972/pipeline

dmitrygx commented 3 years ago
[2020-12-08T10:41:39.582Z] [ RUN      ] posix/test_uct_mm.alloc/0 <posix/memory,dir=.>
[2020-12-08T10:41:39.582Z] [     INFO ] Testing component: posix
[2020-12-08T10:41:39.582Z] [     INFO ] deadbeef11111
[2020-12-08T10:41:39.582Z] [     INFO ] deadbeef11111
[2020-12-08T10:41:39.582Z] [     INFO ] deadbeef22222
[2020-12-08T10:41:39.841Z] [r-vmb-ppc-jenkins:28083:0:28083] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[2020-12-08T10:41:40.097Z] 
[2020-12-08T10:41:40.097Z] /scrap/jenkins/workspace/ucx-5/contrib/../src/uct/sm/mm/base/mm_iface.c: [ uct_mm_iface_t_cleanup() ]
[2020-12-08T10:41:40.097Z]       ...
[2020-12-08T10:41:40.097Z]       708 static UCS_CLASS_CLEANUP_FUNC(uct_mm_iface_t)
[2020-12-08T10:41:40.097Z]       709 {
[2020-12-08T10:41:40.097Z]       710     uct_base_iface_progress_disable(&self->super.super.super,
[2020-12-08T10:41:40.097Z] ==>   711                                     UCT_PROGRESS_SEND | UCT_PROGRESS_RECV);
[2020-12-08T10:41:40.097Z]       712 
[2020-12-08T10:41:40.097Z]       713     /* return all the descriptors that are now 'assigned' to the FIFO,
[2020-12-08T10:41:40.097Z]       714      * to their mpool */
[2020-12-08T10:41:40.097Z] 
[2020-12-08T10:41:40.354Z] ==== backtrace (tid:  28083) ====
[2020-12-08T10:41:40.354Z]  0 0x000000000005efc8 ucs_debug_print_backtrace()  /scrap/jenkins/workspace/ucx-5/contrib/../src/ucs/debug/debug.c:656
[2020-12-08T10:41:40.354Z]  1 0x000000000001ba88 uct_mm_iface_t_cleanup()  /scrap/jenkins/workspace/ucx-5/contrib/../src/uct/sm/mm/base/mm_iface.c:711
[2020-12-08T10:41:40.354Z]  2 0x000000000007b2a8 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/ucx-5/contrib/../src/ucs/type/class.c:56
[2020-12-08T10:41:40.354Z]  3 0x000000000001b3f8 uct_mm_iface_t_delete()  /scrap/jenkins/workspace/ucx-5/contrib/../src/uct/sm/mm/base/mm_iface.c:729
[2020-12-08T10:41:40.354Z]  4 0x00000000000169ac uct_iface_close()  /scrap/jenkins/workspace/ucx-5/contrib/../src/uct/base/uct_iface.c:214
[2020-12-08T10:41:40.354Z]  5 0x0000000010363800 ucs::handle<uct_iface*, void*>::release()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/test_helpers.h:667
[2020-12-08T10:41:40.354Z]  6 0x0000000010363800 ucs::handle<uct_iface*, void*>::reset()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/test_helpers.h:602
[2020-12-08T10:41:40.354Z]  7 0x0000000010363800 ~handle()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/test_helpers.h:597
[2020-12-08T10:41:40.354Z]  8 0x0000000010363800 ~entity()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/uct/uct_test.h:125
[2020-12-08T10:41:40.354Z]  9 0x0000000010363800 ucs::ptr_vector_base<uct_test::entity>::release()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/test_helpers.h:529
[2020-12-08T10:41:40.354Z] 10 0x0000000010363800 ucs::ptr_vector_base<uct_test::entity>::clear()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/test_helpers.h:500
[2020-12-08T10:41:40.354Z] 11 0x0000000010363800 uct_test::cleanup()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/uct/uct_test.cc:454
[2020-12-08T10:41:40.354Z] 12 0x000000001020f5a0 ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/test.cc:315
[2020-12-08T10:41:40.354Z] 13 0x0000000010231874 uct_test::TearDown()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/uct/uct_test.h:100
[2020-12-08T10:41:40.354Z] 14 0x00000000101eae5c HandleSehExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3562
[2020-12-08T10:41:40.354Z] 15 0x00000000101eae5c testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3598
[2020-12-08T10:41:40.354Z] 16 0x00000000101d9f84 testing::Test::Run()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3643
[2020-12-08T10:41:40.354Z] 17 0x00000000101da14c testing::TestInfo::Run()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3812
[2020-12-08T10:41:40.354Z] 18 0x00000000101da398 testing::TestCase::Run()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3930
[2020-12-08T10:41:40.354Z] 19 0x00000000101e0398 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:5808
[2020-12-08T10:41:40.354Z] 20 0x00000000101e07e4 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:5725
[2020-12-08T10:41:40.354Z] 21 0x00000000101e07e4 HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3562
[2020-12-08T10:41:40.354Z] 22 0x00000000101e07e4 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:3598
[2020-12-08T10:41:40.354Z] 23 0x00000000101e07e4 testing::UnitTest::Run()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest-all.cc:5422
[2020-12-08T10:41:40.354Z] 24 0x000000001015d348 RUN_ALL_TESTS()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/gtest.h:20059
[2020-12-08T10:41:40.354Z] 25 0x000000001015d348 main()  /scrap/jenkins/workspace/ucx-5/contrib/../test/gtest/common/main.cc:102
[2020-12-08T10:41:40.354Z] 26 0x0000000000025100 generic_start_main.isra.0()  libc-start.c:0
[2020-12-08T10:41:40.354Z] 27 0x00000000000252f4 __libc_start_main()  ???:0
[2020-12-08T10:41:40.354Z] =================================

Link to the job: http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/ucx/detail/ucx/8479/pipeline/582/ Link to the log: http://hpc-master.lab.mtl.com:8080/blue/rest/organizations/jenkins/pipelines/ucx/runs/8479/nodes/582/steps/587/log/?start=0

brminich commented 3 years ago

also happened here http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/ucx/detail/ucx/9462/pipeline/

[2021-01-26T14:45:07.217Z] [ RUN      ] posix/test_uct_mm.alloc/0 <posix/memory,dir=.>

[2021-01-26T14:45:07.217Z] [     INFO ] Testing component: posix

[2021-01-26T14:45:07.217Z] [     INFO ] deadbeef11111

[2021-01-26T14:45:07.217Z] [     INFO ] deadbeef11111

[2021-01-26T14:45:07.217Z] [     INFO ] deadbeef22222

[2021-01-26T14:45:07.474Z] [r-vmb-ppc-jenkins:3833 :0:3833] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))

[2021-01-26T14:45:07.789Z] 

[2021-01-26T14:45:07.789Z] /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/sm/mm/base/mm_iface.c: [ uct_mm_iface_t_cleanup() ]

[2021-01-26T14:45:07.789Z]       ...

[2021-01-26T14:45:07.789Z]       709 static UCS_CLASS_CLEANUP_FUNC(uct_mm_iface_t)

[2021-01-26T14:45:07.789Z]       710 {

[2021-01-26T14:45:07.789Z]       711     uct_base_iface_progress_disable(&self->super.super.super,

[2021-01-26T14:45:07.789Z] ==>   712                                     UCT_PROGRESS_SEND | UCT_PROGRESS_RECV);

[2021-01-26T14:45:07.789Z]       713 

[2021-01-26T14:45:07.789Z]       714     /* return all the descriptors that are now 'assigned' to the FIFO,

[2021-01-26T14:45:07.789Z]       715      * to their mpool */

[2021-01-26T14:45:07.789Z] 

[2021-01-26T14:45:08.049Z] ==== backtrace (tid:   3833) ====

[2021-01-26T14:45:08.049Z]  0 0x000000000005f9e8 ucs_debug_print_backtrace()  /scrap/jenkins/workspace/ucx-2/contrib/../src/ucs/debug/debug.c:656

[2021-01-26T14:45:08.049Z]  1 0x000000000001ac48 uct_mm_iface_t_cleanup()  /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/sm/mm/base/mm_iface.c:712

[2021-01-26T14:45:08.049Z]  2 0x000000000007c1e8 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/ucx-2/contrib/../src/ucs/type/class.c:56

[2021-01-26T14:45:08.049Z]  3 0x000000000001a5b8 uct_mm_iface_t_delete()  /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/sm/mm/base/mm_iface.c:730

[2021-01-26T14:45:08.049Z]  4 0x000000000001594c uct_iface_close()  /scrap/jenkins/workspace/ucx-2/contrib/../src/uct/base/uct_iface.c:215

[2021-01-26T14:45:08.049Z]  5 0x000000001037e3f0 ucs::handle<uct_iface*, void*>::release()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test_helpers.h:669

[2021-01-26T14:45:08.049Z]  6 0x000000001037e3f0 ucs::handle<uct_iface*, void*>::reset()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test_helpers.h:604

[2021-01-26T14:45:08.049Z]  7 0x000000001037e3f0 ~handle()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test_helpers.h:599

[2021-01-26T14:45:08.049Z]  8 0x000000001037e3f0 ~entity()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/uct/uct_test.h:131

[2021-01-26T14:45:08.049Z]  9 0x000000001037e3f0 ucs::ptr_vector_base<uct_test::entity>::release()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test_helpers.h:531

[2021-01-26T14:45:08.049Z] 10 0x000000001037e3f0 ucs::ptr_vector_base<uct_test::entity>::clear()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test_helpers.h:502

[2021-01-26T14:45:08.049Z] 11 0x000000001037e3f0 uct_test::cleanup()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/uct/uct_test.cc:490

[2021-01-26T14:45:08.049Z] 12 0x0000000010223250 ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/test.cc:316

[2021-01-26T14:45:08.049Z] 13 0x000000001024ac94 uct_test::TearDown()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/uct/uct_test.h:106

[2021-01-26T14:45:08.049Z] 14 0x00000000101fe93c HandleSehExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3562

[2021-01-26T14:45:08.050Z] 15 0x00000000101fe93c testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3598

[2021-01-26T14:45:08.050Z] 16 0x00000000101eda64 testing::Test::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3643

[2021-01-26T14:45:08.050Z] 17 0x00000000101edc2c testing::TestInfo::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3812

[2021-01-26T14:45:08.050Z] 18 0x00000000101ede78 testing::TestCase::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3930

[2021-01-26T14:45:08.050Z] 19 0x00000000101f3e78 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:5808

[2021-01-26T14:45:08.050Z] 20 0x00000000101f42c4 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:5725

[2021-01-26T14:45:08.050Z] 21 0x00000000101f42c4 HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3562

[2021-01-26T14:45:08.050Z] 22 0x00000000101f42c4 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:3598

[2021-01-26T14:45:08.050Z] 23 0x00000000101f42c4 testing::UnitTest::Run()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest-all.cc:5422

[2021-01-26T14:45:08.050Z] 24 0x000000001016ee48 RUN_ALL_TESTS()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/gtest.h:20059

[2021-01-26T14:45:08.050Z] 25 0x000000001016ee48 main()  /scrap/jenkins/workspace/ucx-2/contrib/../test/gtest/common/main.cc:102

[2021-01-26T14:45:08.050Z] 26 0x0000000000025100 generic_start_main.isra.0()  libc-start.c:0

[2021-01-26T14:45:08.050Z] 27 0x00000000000252f4 __libc_start_main()  ???:0

[2021-01-26T14:45:08.050Z] =================================
dmitrygx commented 3 years ago
(gdb) bt
#0  0x00003fff88784478 in pause () from /usr/lib64/libpthread.so.0
#1  0x00003fff88bee974 in ucs_debug_freeze () at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/debug/debug.c:820
#2  0x00003fff88bf2b5c in ucs_error_freeze (message=0x3fff88d30590 "address not mapped to object") at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/debug/debug.c:915
#3  ucs_handle_error (message=0x3fff88d30590 "address not mapped to object") at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/debug/debug.c:1078
#4  0x00003fff88bf2fc8 in ucs_debug_handle_error_signal (signo=11, cause=0x3fff88d30590 "address not mapped to object", fmt=0x3fff88d30680 " at address %p")
    at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/debug/debug.c:1027
#5  0x00003fff88bf3324 in ucs_error_signal_handler (signo=<optimized out>, info=0x3fff888c4778, context=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/debug/debug.c:1049
#6  <signal handler called>
#7  ucs_mpool_add_to_freelist (add_to_tail=<optimized out>, elem=<optimized out>, mp=0x0) at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/datastruct/mpool.inl:55
#8  ucs_mpool_put_inline (obj=0x3fff86c7a2e8) at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/datastruct/mpool.inl:79
#9  ucs_mpool_put (obj=0x3fff86c7a2e8) at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/datastruct/mpool.c:171
#10 0x00003fff88b3adcc in uct_mm_iface_free_rx_descs (num_elems=64, iface=0x10013678470) at /scrap/jenkins/workspace/ucx-9/contrib/../src/uct/sm/mm/base/mm_iface.c:466
#11 uct_mm_iface_t_cleanup (self=0x10013678470) at /scrap/jenkins/workspace/ucx-9/contrib/../src/uct/sm/mm/base/mm_iface.c:717
#12 0x00003fff88c0c668 in ucs_class_call_cleanup_chain (cls=<optimized out>, obj=0x10013678470, limit=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/type/class.c:56
#13 0x00003fff88b3a6f8 in uct_mm_iface_t_delete (self=0x10013678470) at /scrap/jenkins/workspace/ucx-9/contrib/../src/uct/sm/mm/base/mm_iface.c:730
#14 0x00003fff88b35a8c in uct_iface_close (iface=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../src/uct/base/uct_iface.c:215
#15 0x0000000010382590 in release (this=0x10012bc5d00) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/test_helpers.h:669
#16 reset (this=0x10012bc5d00) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/test_helpers.h:604
#17 ~handle (this=0x10012bc5d00, __in_chrg=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/test_helpers.h:599
#18 ~entity (this=0x10012bc58a0, __in_chrg=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/uct/uct_test.h:128
#19 release (this=<optimized out>, ptr=0x10012bc58a0) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/test_helpers.h:531
#20 clear (this=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/test_helpers.h:502
#21 uct_test::cleanup (this=0x10012fc9f60) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/uct/uct_test.cc:475
#22 0x0000000010227290 in ucs::test_base::TearDownProxy (this=0x10012fc9f78) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/test.cc:316
#23 0x000000001024ecd4 in uct_test::TearDown (this=<error reading variable: value has been optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/uct/uct_test.h:109
#24 0x000000001020297c in HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x109106a0 "TearDown()", method=<optimized out>, object=0x10012fc9f60)
    at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3562
#25 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x10012fc9f60, method=<optimized out>, location=0x109106a0 "TearDown()")
    at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3598
#26 0x00000000101f1aa4 in testing::Test::Run (this=0x10012fc9f60) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3643
#27 0x00000000101f1c6c in testing::TestInfo::Run (this=0x10012254fd0) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3812
#28 0x00000000101f1eb8 in testing::TestCase::Run (this=0x10012254620) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3930
#29 0x00000000101f7eb8 in testing::internal::UnitTestImpl::RunAllTests (this=0x10012171c00) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:5808
#30 0x00000000101f8304 in RunAllTests (this=0x10012171c00) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:5725
#31 HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=<optimized out>, method=<optimized out>, object=0x10012171c00)
    at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3562
#32 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x10910b60 "auxiliary test code (environments or event listeners)",
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x101f81e0 <testing::internal::UnitTestImpl::RunAllTests()>, object=0x10012171c00)
    at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:3598
#33 testing::UnitTest::Run (this=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest-all.cc:5422
#34 0x0000000010171e88 in RUN_ALL_TESTS () at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/gtest.h:20059
#35 main (argc=5, argv=<optimized out>) at /scrap/jenkins/workspace/ucx-9/contrib/../test/gtest/common/main.cc:102
(gdb) f 7
#7  ucs_mpool_add_to_freelist (add_to_tail=<optimized out>, elem=<optimized out>, mp=0x0) at /scrap/jenkins/workspace/ucx-9/contrib/../src/ucs/datastruct/mpool.inl:55
55              elem->next = mp->freelist;
(gdb) p mp
$1 = (ucs_mpool_t *) 0x0
(gdb) f 10
#10 0x00003fff88b3adcc in uct_mm_iface_free_rx_descs (num_elems=64, iface=0x10013678470) at /scrap/jenkins/workspace/ucx-9/contrib/../src/uct/sm/mm/base/mm_iface.c:466
466             ucs_mpool_put(desc);
(gdb) p i
$3 = <optimized out>
(gdb) p desc
$4 = <optimized out>
(gdb) p elem
$5 = <optimized out>
(gdb) p iface->recv_fifo_elems
$6 = (void *) 0x3fff87770100
(gdb) p ((ucs_mpool_elem_t*)iface->last_recv_desc - 1)->mpool
$8 = (ucs_mpool_t *) 0x0