yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL][OOM][Core dump] Segmentation fault SEGSEGV: DW_TAG_variable has an invalid location #22237

Open archit-rastogi opened 6 months ago

archit-rastogi commented 6 months ago

Jira Link: DB-11157

Description

Found core dump during existing test test_batched_join_workload on master.

Build: 2.23.0.0-b265

Exception traceback:

(lldb) target create "/home/yugabyte/tserver/bin/yb-tserver" --core "/home/yugabyte/cores/core_1019_1714575736_!home!yugabyte!yb-software!yugabyte-2.23.0.0-b265-almalinux8-aarch64!bin!yb-server"
Core file '/home/yugabyte/cores/core_1019_1714575736_!home!yugabyte!yb-software!yugabyte-2.23.0.0-b265-almalinux8-aarch64!bin!yb-server' (aarch64) was loaded.
(lldb) bt all
error: yb-tserver 0x0000000001eab82b: DW_TAG_variable has an invalid location: DW_OP_breg23 +0, DW_OP_convert 0x34, DW_OP_convert 0x3a, DW_OP_stack_value
error: yb-tserver 0x0000000001eab83f: DW_TAG_variable has an invalid location: DW_OP_breg21 +0, DW_OP_convert 0x34, DW_OP_convert 0x3a, DW_OP_stack_value
error: yb-tserver 0x0000000001eab96d: DW_TAG_variable has an invalid location: DW_OP_breg26 +0, DW_OP_convert 0x34, DW_OP_convert 0x3a, DW_OP_stack_value
error: yb-tserver 0x0000000001eab981: DW_TAG_variable has an invalid location: DW_OP_breg27 +0, DW_OP_convert 0x34, DW_OP_convert 0x3a, DW_OP_stack_value
error: yb-tserver 0x0000000004b8d0ee: DW_TAG_variable has an invalid location: DW_OP_piece 0x6, DW_OP_breg11 +0, DW_OP_constu 0x30, DW_OP_shr, DW_OP_convert 0x34, DW_OP_convert 0x3a, DW_OP_plus_uconst 0x1, DW_OP_stack_value, DW_OP_piece 0x2
error: yb-tserver 0x000000000691cbcb: DW_TAG_variable has an invalid location: DW_OP_breg21 +0, DW_OP_convert 0x34, DW_OP_convert 0x3a, DW_OP_stack_value
* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
  * frame #0: 0x0000ffff90b4dc58 libpthread.so.0`pthread_cond_wait@@GLIBC_2.17 + 528
    frame #1: 0x0000aaaacfc62490 yb-tserver`yb::TerminationMonitor::WaitForTermination() [inlined] void std::__1::condition_variable::wait<yb::TerminationMonitor::WaitForTermination()::$_0>(this=0x0000262dbfc280b8, __lk=<unavailable>, __pred=(unnamed class) @ x19) at condition_variable.h:148:5
    frame #2: 0x0000aaaacfc62478 yb-tserver`yb::TerminationMonitor::WaitForTermination(this=0x0000262dbfc28080) at termination_monitor.cc:68:16
    frame #3: 0x0000aaaad1069428 yb-tserver`yb::tserver::TabletServerMain(argc=<unavailable>, argv=<unavailable>) at tablet_server_main_impl.cc:376:24
    frame #4: 0x0000aaaacf85a6d0 yb-tserver`main(argc=3, argv=0x0000ffffee296c08) at master_main.cc:184:12
    frame #5: 0x0000ffff90ba4384 libc.so.6`__libc_start_main + 220
    frame #6: 0x0000aaaacf77b034 yb-tserver`_start + 52

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

rthallamko3 commented 5 months ago

@archit-rastogi , Can you check if the postgres process crashed and brought down the tserver.

stop reason = signal SIGABRT error indicates that tserver was brought down by an external entity.

pilshchikov commented 2 months ago

@rthallamko3 Case was reproduced on other case:

  1. Create many tables (1000)
  2. Create PITR schedule
  3. Load data
  4. Restore

After a two cycles and second restore this core dump was thrown Only one nemesis was used on first cycle which is restarrting master process was down 2 times. Second cycle was clean and nothing was done on cluster Logs can be found in a comment in JIRA ticket