sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.43k forks source link

[bgpd] Crash in libyang during boot up #20760

Open stepanblyschak opened 2 weeks ago

stepanblyschak commented 2 weeks ago

Description

BGP docker crashed during the test and it could not be recovered during the test.

Steps to reproduce the issue:

  1. Issue reproduced during sonic-mgmt platform_tests/test_advanced_reboot.py::test_warm_reboot_sad test
  2. BGP docker crashed during the test and it could not be recovered during the test:
2024 Nov  9 03:21:40.847546 arc-switch1004 INFO bgp#supervisord 2024-11-09 01:21:40,845 WARN exited: bgpd (terminated by SIGSEGV (core dumped); not expected)

Describe the results you received:

Logs:

2024 Nov  9 03:21:37.123288 sonic DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpfcjtdazx']'.
2024 Nov  9 03:21:37.194414 sonic INFO sonic-ztp[4005]: ZTP is administratively disabled.
2024 Nov  9 03:21:37.445364 sonic CRIT bgp#BGP[60]: Received signal 11 at 1731115297 (si_addr 0x4, PC 0x7fdc3c18748c); aborting...
2024 Nov  9 03:21:37.449344 sonic CRIT bgp#BGP[60]: zlog_signal+0xf5                   7fdc3c61f345     7ffcf5aed3b0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.449344 sonic CRIT bgp#BGP[60]: PBKDF2_SHA256+0x4b1                7fdc3c64cf81     7ffcf5aed4f0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.449344 sonic CRIT bgp#BGP[60]: __sigaction+0x40                   7fdc3c2e2050     7ffcf5aed640 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7fdc3c2a6000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]:     ---- signal ----
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: ly_err_print+0xe1c                 7fdc3c18748c     7ffcf5aedae0 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: lys_ypr_ctx_get_level+0x3af0       7fdc3c21c5f0     7ffcf5aedb50 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: lys_ypr_ctx_get_level+0x672d       7fdc3c21f22d     7ffcf5aedbb0 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: lys_ypr_ctx_get_level+0x15a46      7fdc3c22e546     7ffcf5aedc20 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: lys_ypr_ctx_get_level+0x1364b      7fdc3c22c14b     7ffcf5aedd00 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: lys_ypr_ctx_get_level+0x12b62      7fdc3c22b662     7ffcf5aede90 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.464006 sonic CRIT bgp#BGP[60]: lys_ypr_ctx_get_level+0x17664      7fdc3c230164     7ffcf5aee020 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: lyxp_get_expr+0x1a6                7fdc3c2307e6     7ffcf5aee090 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: lyxp_get_expr+0x2a97               7fdc3c2330d7     7ffcf5aee1a0 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: lyxp_get_expr+0x30d7               7fdc3c233717     7ffcf5aee240 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: lyd_validate_all+0x42              7fdc3c2338e2     7ffcf5aee370 /lib/x86_64-linux-gnu/libyang.so.2 (mapped at 0x7fdc3c177000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: nb_candidate_commit_prepare+0x4e     7fdc3c62ea8e     7ffcf5aee3a0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: nb_candidate_commit+0x47           7fdc3c62ed97     7ffcf5aee400 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.469685 sonic CRIT bgp#BGP[60]: nb_terminate+0x29f8                7fdc3c631c68     7ffcf5aee450 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.483873 sonic CRIT bgp#BGP[60]: nb_cli_pending_commit_check+0x28     7fdc3c631da8     7ffcf5af04b0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.483873 sonic CRIT bgp#BGP[60]: cmd_exit+0x28d                     7fdc3c5f169d     7ffcf5af04d0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.483873 sonic CRIT bgp#BGP[60]: cmd_execute_command+0xd7           7fdc3c5f19f7     7ffcf5af0540 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.491285 sonic CRIT bgp#BGP[60]: cmd_execute+0xd0                   7fdc3c5f1c10     7ffcf5af0590 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.491285 sonic CRIT bgp#BGP[60]: vty_set_include+0x197              7fdc3c664127     7ffcf5af05f0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.491285 sonic CRIT bgp#BGP[60]: vty_set_include+0x964              7fdc3c6648f4     7ffcf5af26a0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.491285 sonic CRIT bgp#BGP[60]: vty_close+0x1f08                   7fdc3c667b48     7ffcf5af26e0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.491285 sonic CRIT bgp#BGP[60]: thread_call+0x7d                   7fdc3c65ee2d     7ffcf5af2930 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.517219 sonic CRIT bgp#BGP[60]: frr_run+0xe8                       7fdc3c617368     7ffcf5af29d0 /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0 (mapped at 0x7fdc3c57c000)
2024 Nov  9 03:21:37.533015 sonic CRIT bgp#BGP[60]: main+0x36b                         55e2769d238b     7ffcf5af2be0 /usr/lib/frr/bgpd (mapped at 0x55e2768e8000)
2024 Nov  9 03:21:37.533015 sonic CRIT bgp#BGP[60]: __libc_init_first+0x8a             7fdc3c2cd24a     7ffcf5af2c40 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7fdc3c2a6000)
2024 Nov  9 03:21:37.533015 sonic CRIT bgp#BGP[60]: __libc_start_main+0x85             7fdc3c2cd305     7ffcf5af2ce0 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7fdc3c2a6000)
2024 Nov  9 03:21:37.533015 sonic CRIT bgp#BGP[60]: _start+0x21                        55e2769d4091     7ffcf5af2d30 /usr/lib/frr/bgpd (mapped at 0x55e2768e8000)
2024 Nov  9 03:21:37.533015 sonic CRIT bgp#BGP[60]: in thread vtysh_read scheduled from ../lib/vty.c:2740 vty_event()

Backtrace shows the issue comes from libyang hash table implementation:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007fdc3c330e9f in __pthread_kill_internal (signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007fdc3c2e1fb2 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#3  0x00007fdc3c64cfbc in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#4  <signal handler called>
#5  0x00007fdc3c18748c in lyht_insert_with_resize_cb (ht=0x55e2784fb6d0, val_p=0x7ffcf5aedb5c, hash=3760792813, resize_val_equal=resize_val_equal@entry=0x0, match_p=0x0) at ./src/hash_table.c:697
#6  0x00007fdc3c187b5a in lyht_insert (ht=<optimized out>, val_p=<optimized out>, hash=<optimized out>, match_p=<optimized out>) at ./src/hash_table.c:746
#7  0x00007fdc3c21c5f0 in set_insert_node_hash (set=0x55e2784fc970, node=0x55e2784f5160, type=<optimized out>) at ./src/xpath.c:647
#8  0x00007fdc3c21f22d in moveto_node (set=set@entry=0x55e2784fc970, moveto_mod=0x55e277e02270, ncname=ncname@entry=0x55e277dead90 "entry", options=options@entry=2) at ./src/xpath.c:5603
#9  0x00007fdc3c22e546 in eval_name_test_with_predicate (options=2, set=<optimized out>, all_desc=<optimized out>, attr_axis=<optimized out>, tok_idx=0x7ffcf5aee046, exp=0x55e277e3a990) at ./src/xpath.c:7350
#10 eval_relative_location_path (exp=0x55e277e3a990, tok_idx=0x7ffcf5aee046, all_desc=<optimized out>, set=<optimized out>, options=2) at ./src/xpath.c:7522
#11 0x00007fdc3c22ae8d in eval_path_expr (options=21986, set=<optimized out>, tok_idx=0x7ffcf5aee046, exp=0x55e277e3a990) at ./src/xpath.c:8072
#12 0x00007fdc3c22c14b in eval_function_call (options=2, set=0x7ffcf5aee0d0, tok_idx=0x7ffcf5aee046, exp=0x55e277e3a990) at ./src/xpath.c:7772
#13 eval_path_expr (options=2, set=0x7ffcf5aee0d0, tok_idx=0x7ffcf5aee046, exp=0x55e277e3a990) at ./src/xpath.c:8002
#14 eval_expr_select (exp=exp@entry=0x55e277e3a990, tok_idx=tok_idx@entry=0x7ffcf5aee046, etype=etype@entry=LYXP_EXPR_OR, set=set@entry=0x7ffcf5aee0d0, options=options@entry=2) at ./src/xpath.c:8666
#15 0x00007fdc3c22b662 in eval_or_expr (options=2, set=0x7ffcf5aee0d0, repeat=<optimized out>, tok_idx=0x7ffcf5aee046, exp=0x55e277e3a990) at ./src/xpath.c:8558
#16 eval_expr_select (exp=exp@entry=0x55e277e3a990, tok_idx=tok_idx@entry=0x7ffcf5aee046, etype=etype@entry=LYXP_EXPR_NONE, set=set@entry=0x7ffcf5aee0d0, options=options@entry=2) at ./src/xpath.c:8642
#17 0x00007fdc3c230164 in lyxp_eval (ctx=0x55e277db3a00, exp=0x55e277e3a990, cur_mod=0x55e277e0e390, format=format@entry=LY_VALUE_SCHEMA_RESOLVED, prefix_data=<optimized out>, ctx_node=0x55e2784fa6c0, tree=0x55e277e2d6b0, 
    vars=<optimized out>, set=<optimized out>, options=<optimized out>) at ./src/xpath.c:8758
#18 0x00007fdc3c2307e6 in lyd_validate_node_when (tree=0x55e277e2c610, node=node@entry=0x55e2784fbd10, schema=<optimized out>, disabled=disabled@entry=0x7ffcf5aee1f0) at ./src/validation.c:153
#19 0x00007fdc3c2330d7 in lyd_validate_unres_when (diff=0x0, node_types=0x7ffcf5aee2e0, node_when=<optimized out>, mod=0x55e277e02270, tree=0x7ffcf5aee2d0) at ./src/validation.c:206
#20 lyd_validate_unres (tree=0x7ffcf5aee2d0, mod=0x55e277e02270, node_when=<optimized out>, node_exts=0x7ffcf5aee310, node_types=0x7ffcf5aee2e0, meta_types=0x7ffcf5aee2f0, diff=0x0) at ./src/validation.c:322
#21 0x00007fdc3c233717 in lyd_validate (tree=0x55e277dff110, module=module@entry=0x0, ctx=0x55e277db3a00, val_opts=1, validate_subtree=validate_subtree@entry=1 '\001', node_when_p=0x7ffcf5aee300, node_when_p@entry=0x0, 
    node_exts_p=0x7ffcf5aee310, node_types_p=0x7ffcf5aee2e0, meta_types_p=0x7ffcf5aee2f0, diff=0x0) at ./src/validation.c:1577
#22 0x00007fdc3c2338e2 in lyd_validate_all (tree=<optimized out>, ctx=<optimized out>, val_opts=<optimized out>, diff=<optimized out>) at ./src/validation.c:1604
#23 0x00007fdc3c62ea8e in nb_candidate_commit_prepare () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#24 0x00007fdc3c62ed97 in nb_candidate_commit () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#25 0x00007fdc3c631c68 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#26 0x00007fdc3c631da8 in nb_cli_pending_commit_check () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#27 0x00007fdc3c5f169d in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#28 0x00007fdc3c5f19f7 in cmd_execute_command () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#29 0x00007fdc3c5f1c10 in cmd_execute () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#30 0x00007fdc3c664127 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#31 0x00007fdc3c6648f4 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#32 0x00007fdc3c667b48 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#33 0x00007fdc3c65ee2d in thread_call () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#34 0x00007fdc3c617368 in frr_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#35 0x000055e2769d238b in main ()

Crashes at dereference:

   0x00007fdc3c187484 <+788>:   test   %ebx,%ebx
   0x00007fdc3c187486 <+790>:   jg     0x7fdc3c187609 <lyht_insert_with_resize_cb+1177>
=> 0x00007fdc3c18748c <+796>:   mov    0x4(%rcx),%ebx
   0x00007fdc3c18748f <+799>:   jmp    0x7fdc3c187236 <lyht_insert_with_resize_cb+198>

...

(gdb) p $rcx
$42 = 0

which corresponds to the code in libyang:

LY_ERR
lyht_insert_with_resize_cb(struct hash_table *ht, void *val_p, uint32_t hash, lyht_value_equal_cb resize_val_equal,
        void **match_p)

...

    /* insert it into the returned record */
    assert(rec->hits < 1);
    if (rec->hits < 0) {.   <========= line crashed
        --ht->invalid;
    }

Describe the results you expected:

Output of show version:

SONiC Software Version: SONiC.202405_RC.45-28a64576c_Internal
SONiC OS Version: 12
Distribution: Debian 12.7
Kernel: 6.1.0-22-2-amd64
Build commit: 28a64576c
Build date: Thu Nov  7 06:41:16 UTC 2024

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Core dump: bgpd.1731115297.60.core.gz

prgeor commented 1 week ago

@StormLiangMS can you check with @qiluo-msft if this issue with libyang. As per @stepanblyschak this cannot be reproduced easily. please check the coredump