yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.94k stars 1.06k forks source link

[YCQL Packed Columns][Stress Tests] [2.19.3.0-b71] Segmentation Fault with libyb_util.so during Metric Writing #19046

Open zlareb1-yb opened 1 year ago

zlareb1-yb commented 1 year ago

Jira Link: DB-7863

Description

Observed segmentation fault (SIGSEGV) while running Stress Tests. The issue appears to be related to metric writing and involves the libyb_util.so library.

Note - This issue is observed for the first time

Steps: image

Core dumps:

(lldb) target create "/home/yugabyte/yb-software/yugabyte-2.19.3.0-b71-centos-x86_64/postgres/bin/postgres" --core "/home/yugabyte/cores/core_740077_1694128662_!home!yugabyte!yb-software!yugabyte-2.19.3.0-b71-centos-x86_64!postgres!bin!postgres"
Core file '/home/yugabyte/cores/core_740077_1694128662_!home!yugabyte!yb-software!yugabyte-2.19.3.0-b71-centos-x86_64!postgres!bin!postgres' (x86_64) was loaded.
(lldb) bt all
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
* thread #1, name = 'postgres', stop reason = signal SIGSEGV
  * frame #0: 0x00007f7dc4a4fde7 libyb_util.so`std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, void*>*> std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>(this=<unavailable>, __k=<unavailable>) const at __hash_table:2343:31
    frame #1: 0x00007f7dc4a4fa1f libyb_util.so`yb::PrometheusWriter::WriteSingleEntry(std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, long, yb::AggregationFunction, char const*, char const*, unsigned int) [inlined] std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find[abi:v160006](this=0x00007f7dc3d23e28, __k="metric_type") const at unordered_map:1445:69
    frame #2: 0x00007f7dc4a4fa09 libyb_util.so`yb::PrometheusWriter::WriteSingleEntry(this=0x00007f7db92ed818, attr=0x00007f7dc3d23e28, name="handler_latency_yb_ysqlserver_SQLProcessor_SelectStmt_count", value=4, aggregation_function=kSum, type="", description="", aggregation_levels=1) at metrics_writer.cc:113:30
    frame #3: 0x00007f7dc3d1ee29 libyb_pggate_webserver.so`yb::pggate::PgPrometheusMetricsHandler(req=<unavailable>, resp=<unavailable>) at pgsql_webserver_wrapper.cc:421:5
    frame #4: 0x00007f7dc62a1d70 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::__function::__value_func<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator(this=0x000026e57fdb9e40, __args=0x00007f7db92efb88, __args=0x00007f7db92efa60)[abi:v160006](yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*&&) const at function.h:510:16
    frame #5: 0x00007f7dc62a1d57 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::function<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator(this= Function = yb::pggate::PgPrometheusMetricsHandler(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) , __arg=0x00007f7db92efb88, __arg=0x00007f7db92eda60)(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) const at function.h:1156:12
    frame #6: 0x00007f7dc62a1d57 libserver_process.so`yb::Webserver::Impl::RunPathHandler(this=0x000026e57fd7a000, handler=0x000026e57fd0dcc0, connection=0x000026e57f863000, request_info=<unavailable>) at webserver.cc:625:5
    frame #7: 0x00007f7dc62a18db libserver_process.so`yb::Webserver::Impl::BeginRequestCallback(this=0x000026e57fd7a000, connection=0x000026e57f863000, request_info=0x000026e57f863000) at webserver.cc:557:10
    frame #8: 0x00007f7dc62ae559 libserver_process.so`process_new_connection + 4265
    frame #9: 0x00007f7dc62ad1b8 libserver_process.so`worker_thread + 232
    frame #10: 0x00007f7dc91e3694 libpthread.so.0`start_thread(arg=0x00007f7db92ff700) at pthread_create.c:333
    frame #11: 0x00007f7dc892041d libc.so.6`__clone at clone.S:109
  thread #2, stop reason = signal 0
    frame #0: 0x00007f7dc89175cd libc.so.6`poll at syscall-template.S:84
    frame #1: 0x00007f7dc62acd6d libserver_process.so`master_thread + 573
    frame #2: 0x00007f7dc91e3694 libpthread.so.0`start_thread(arg=0x00007f7db9b00700) at pthread_create.c:333
    frame #3: 0x00007f7dc892041d libc.so.6`__clone at clone.S:109
  thread #3, stop reason = signal 0
    frame #0: 0x00007f7dc886fcac libc.so.6`__cxa_finalize(d=0x00007f7dc4e9b088) at cxa_finalize.c:48
    frame #1: 0x00007f7dc4e735be libyb_common_base.so`__do_fini + 62
    frame #2: 0x00007f7dca84732a ld.so`_dl_fini at dl-fini.c:235
    frame #3: 0x00007f7dc886f969 libc.so.6`__run_exit_handlers(status=1, listp=0x00007f7dc8bd05c0, run_list_atexit=true) at exit.c:82
    frame #4: 0x00007f7dc886f9b5 libc.so.6`__GI_exit(status=<unavailable>) at exit.c:104
    frame #5: 0x00005591e6471072 postgres`proc_exit(code=1) at ipc.c:157:2
    frame #6: 0x00005591e6658b51 postgres`errfinish(dummy=<unavailable>) at elog.c:801:3
    frame #7: 0x00005591e63d066d postgres`bgworker_die(postgres_signal_arg=<unavailable>) at bgworker.c:672:2
    frame #8: 0x00007f7dc91ebba0 libpthread.so.0`__restore_rt
    frame #9: 0x00007f7dc89209f3 libc.so.6`epoll_wait at syscall-template.S:84
    frame #10: 0x00005591e6473066 postgres`WaitEventSetWait [inlined] WaitEventSetWaitBlock(set=0x000026e57fd9fe88, cur_timeout=-1, occurred_events=0x00007fffe66dcc68, nevents=1) at latch.c:1062:7
    frame #11: 0x00005591e6473056 postgres`WaitEventSetWait(set=0x000026e57fd9fe88, timeout=-1, occurred_events=<unavailable>, nevents=1, wait_event_info=<unavailable>) at latch.c:1014:8
    frame #12: 0x00005591e6472b03 postgres`WaitLatchOrSocket(latch=0x00007f7dc2e9088c, wakeEvents=<unavailable>, sock=-1, timeout=<unavailable>, wait_event_info=117440512) at latch.c:399:7
    frame #13: 0x00005591e6472a00 postgres`WaitLatch(latch=<unavailable>, wakeEvents=<unavailable>, timeout=<unavailable>, wait_event_info=<unavailable>) at latch.c:353:9 [artificial]
    frame #14: 0x00007f7dc3d02300 yb_pg_metrics.so`webserver_worker_main(unused=<unavailable>) at yb_pg_metrics.c:375:3
    frame #15: 0x00005591e63d0502 postgres`StartBackgroundWorker at bgworker.c:841:2
    frame #16: 0x00005591e63e950b postgres`maybe_start_bgworkers [inlined] do_start_bgworker(rw=0x000026e57fd7a500) at postmaster.c:6005:4
    frame #17: 0x00005591e63e94ae postgres`maybe_start_bgworkers at postmaster.c:6231:9
    frame #18: 0x00005591e63e5c8d postgres`PostmasterMain(argc=25, argv=0x000026e57fd041a0) at postmaster.c:1429:2
    frame #19: 0x00005591e62e9670 postgres`PostgresServerProcessMain(argc=25, argv=0x000026e57fd041a0) at main.c:234:3
    frame #20: 0x00005591e5fab8d2 postgres`main + 34
    frame #21: 0x00007f7dc885a825 libc.so.6`__libc_start_main(main=(postgres`main), argc=25, argv=0x00007fffe66dd5b8, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffe66dd5a8) at libc-start.c:289
    frame #22: 0x00005591e5fab7e9 postgres`_start at start.S:108

Complete Stress Test report - http://stress.dev.yugabyte.com/stress_test/28b9fc60-643c-498d-bec4-f917f7ad5f5c

Build - 2.19.3.0-b71 Last run (with successful) build of Stress test - 2.19.3.0-b60

cc: @renjith-yb @kripasreenivasan

Warning: Please confirm that this issue does not contain any sensitive information

spolitov commented 1 year ago

This issue is not related to packed rows and happens purely in postgres process.

sushantrmishra commented 1 year ago

Possible duplicate of https://github.com/yugabyte/yugabyte-db/issues/18948 . We should reverify once fix for https://github.com/yugabyte/yugabyte-db/issues/18948 has landed.