scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
58 stars 95 forks source link

Second reboot of the DB node caused syslong-ng to crash on that node #6217

Open vponomaryov opened 1 year ago

vponomaryov commented 1 year ago

Issue description

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

1/1

Installation details

Kernel Version: 5.19.0-1024-gcp Scylla version (or git commit hash): 5.2.2-20230530.9dd70a58c3f9 with build-id d3997884004abbd7d6f60bf507b63050c88c4cf3

Cluster size: 6 nodes (n1-highmem-16)

Scylla Nodes used in this run:

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/929956698705654922 (gce: us-east1)

Test: longevity-10gb-3h-gce-test Test id: e97657e6-8081-4001-a90b-347ede0f569d Test name: scylla-5.2/longevity/longevity-10gb-3h-gce-test Test config file(s):

Details:

The problem appeared during the multiple_hard_reboot_node nemesis which has following steps:

And after the second reboot of the same node it generated core dump:

2023-05-30 15:05:04.477 <2023-05-30 15:04:16.000>: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=463d5ef0-7de2-43f7-ac64-e97f43898f96 during_nemesis=MultipleHardRebootNode node=Node longevity-10gb-3h-5-2-db-node-e97657e6-0-2 [34.23.230.19 | 10.142.15.225] (seed: False)
corefile_url=https://storage.cloud.google.com/upload.scylladb.com/core.syslog-ng.0.0f924711a8d9473b9575d9a077a8e8ab.654.1685459056000000/core.syslog-ng.0.0f924711a8d9473b9575d9a077a8e8ab.654.1685459056000000.gz
backtrace=           PID: 654 (syslog-ng)
UID: 0 (root)
GID: 0 (root)
Signal: 6 (ABRT)
Timestamp: Tue 2023-05-30 15:04:16 UTC (42s ago)
Command Line: /usr/sbin/syslog-ng -F
Executable: /usr/sbin/syslog-ng
Control Group: /system.slice/syslog-ng.service
Unit: syslog-ng.service
Slice: system.slice
Boot ID: 0f924711a8d9473b9575d9a077a8e8ab
Machine ID: 5f141c78be308ba23c1c5ab79c34871b
Hostname: longevity-10gb-3h-5-2-db-node-e97657e6-0-2
Storage: /var/lib/systemd/coredump/core.syslog-ng.0.0f924711a8d9473b9575d9a077a8e8ab.654.1685459056000000 (present)
Disk Size: 132.5M
Message: Process 654 (syslog-ng) of user 0 dumped core.
Found module linux-vdso.so.1 with build-id: 64f6dee2552b009ce985175e6d398ee0f469d702
Found module libdisk-buffer.so with build-id: 5efe71bfeba624fbffd4de160cffbd6f847b5cbf
Found module libresolv.so.2 with build-id: 7fd7253c61aa6fce2b7e13851c15afa14a5ab160
Found module libkeyutils.so.1 with build-id: ff27227afa5eeddccab180dd29bd7fcff94aea7c
Found module libkrb5support.so.0 with build-id: 85c1fccae74910b1afbe878af2202ec6139d8fc2
Found module libcom_err.so.2 with build-id: ce0901f10854b3c9276066b98d9a72303206e0d5
Found module libk5crypto.so.3 with build-id: 8bc1e44d4148b2b533d5a97335114565d94197f8
Found module libkrb5.so.3 with build-id: 62434c49e8118c49a9d60a0795705c806524782d
Found module libgssapi_krb5.so.2 with build-id: a05177e3a955af79b999bbc081b0f7bf9fb21c87
Found module libtirpc.so.3 with build-id: 1a361e188043ff5abfdb655af6186b8a0f9b205f
Found module libnsl.so.2 with build-id: 400d0e5ea8cb66596b2f49bfd0dfe0330ef9f51d
Found module libwrap.so.0 with build-id: 020d6fd39e85ceab9f1667cacd92978054136b2d
Found module libnet.so.1 with build-id: ca57e11a9c25422b20dd7eb267279dc106676b59
Found module libafsocket.so with build-id: 598444cedf68cd067ebe2d28cf8d0ff89da634a5
Found module libafuser.so with build-id: 3f66a974908bc202c8679519526cad5b3e529bab
Found module libaffile.so with build-id: fc40cfb426f3ae582e0ace4b9ffa3188e8c20299
Found module libcsvparser.so with build-id: adda66455d1de5bbf37e8eeb9ba19b94efc07208
Found module libkvformat.so with build-id: 1710e07701dc093b072d4313ac1668d4ca03968c
Found module libsdjournal.so with build-id: 7a6a7fa9da576b3e818f018d254de60a9deec0ef
Found module libsystem-source.so with build-id: 0bf4d71f6ce74a2a9d73b6cf1542fbbd92d76adf
Found module libconfgen.so with build-id: c9308dddb93b3b3a60eb0460cbf74965b17d22b6
Found module libbasicfuncs.so with build-id: 13b700006bf19c2171765630497f05b2945c4148
Found module libjson-c.so.5 with build-id: cd1ad04f5d85001e965fddf7fe021e16ba7a4ba4
Found module libjson-plugin.so with build-id: 9a251846edf2430a47f4972f2794e3949389daea
Found module libgcc_s.so.1 with build-id: 09c4935b79388431a1248f6a98e00d7dc81b8513
Found module libstdc++.so.6 with build-id: f57e02bfadacc0c923c82457d5e18e1830b5faea
Found module libappmodel.so with build-id: 52199027409baa45d1e7a3f37f3bc1b7791f26bf
Found module libgpg-error.so.0 with build-id: 3fbec71c67bee60d8aef00697ee187079b0fb307
Found module libgcrypt.so.20 with build-id: 60a5e524de0ed8323edf33e9eb9127a9eee02359
Found module liblz4.so.1 with build-id: a85971851cd059f1af80d553c8e7170d42ec59a1
Found module libzstd.so.1 with build-id: 5d9d0d946a3154a748e87e17af9d14764519237b
Found module liblzma.so.5 with build-id: b85da6c48eb60a646615392559483b93617ef265
Found module libm.so.6 with build-id: 27e82301dba6c3f644404d504e1bb1c97894b433
Found module ld-linux-x86-64.so.2 with build-id: 61ef896a699bb1c2e4e231642b2e1688b2f1a61e
Found module libcrypto.so.3 with build-id: 62ba5ee88d663a2396160fed1a1864f1f3b60103
Found module libssl.so.3 with build-id: 4bc97c5bb581ccfe9ae803981f527ce321f16c7a
Found module libsecret-storage.so.0 with build-id: c410c372d587d46b223a101af59b1078bb0a3c05
Found module libsystemd.so.0 with build-id: e45f7492c0f62251620378d7224ad0371a8d1f98
Found module libivykis.so.0 with build-id: bd7ccc964e3935becf44f5c0d39f29c01bb93976
Found module libpcre.so.3 with build-id: 3982f316c887e3ad9598015fa5bae8557320476a
Found module libcap.so.2 with build-id: 9e11e3bca4b0a25d047cb36e933e1d727663cf8e
Found module libevtlog-3.35.so.0 with build-id: 3691bc840a7227cb7b7361a831dafad3ecc4d2e7
Found module libgmodule-2.0.so.0 with build-id: 0b98edffeab1f749240487745c728cbb9be665c8
Found module libc.so.6 with build-id: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d
Found module libglib-2.0.so.0 with build-id: 4391c4dcc011997cd19e40aff210bbea196f2c57
Found module libsyslog-ng-3.35.so.0 with build-id: 680862c9519ac2453b7ab3089c35c5dc331301bf
Found module syslog-ng with build-id: 22297a2c99103655728606cf732658f9fa426060
Stack trace of thread 889:
#0  0x00007fb8c8496a7c pthread_kill (libc.so.6 + 0x96a7c)
#1  0x00007fb8c8442476 raise (libc.so.6 + 0x42476)
#2  0x00007fb8c84287f3 abort (libc.so.6 + 0x287f3)
#3  0x00007fb8c8656777 iv_fatal (libivykis.so.0 + 0x4777)
#4  0x00007fb8c865c070 n/a (libivykis.so.0 + 0xa070)
#5  0x00007fb8c8875786 log_source_wakeup (libsyslog-ng-3.35.so.0 + 0x41786)
#6  0x00007fb8c8875968 n/a (libsyslog-ng-3.35.so.0 + 0x41968)
#7  0x00007fb8c887818f log_source_flow_control_adjust (libsyslog-ng-3.35.so.0 + 0x4418f)
#8  0x00007fb8c889783e n/a (libsyslog-ng-3.35.so.0 + 0x6383e)
#9  0x00007fb8c88b7170 log_msg_refcache_stop (libsyslog-ng-3.35.so.0 + 0x83170)
#10 0x00007fb8c887e9bb n/a (libsyslog-ng-3.35.so.0 + 0x4a9bb)
#11 0x00007fb8c886fefa n/a (libsyslog-ng-3.35.so.0 + 0x3befa)
#12 0x00007fb8c887d9e5 n/a (libsyslog-ng-3.35.so.0 + 0x499e5)
#13 0x00007fb8c86578fa n/a (libivykis.so.0 + 0x58fa)
#14 0x00007fb8c86562a3 n/a (libivykis.so.0 + 0x42a3)
#15 0x00007fb8c865c3e5 n/a (libivykis.so.0 + 0xa3e5)
#16 0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#17 0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#18 0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#19 0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#20 0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#21 0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 885:
#0  0x00007fb8c88bc1b8 nv_table_unref (libsyslog-ng-3.35.so.0 + 0x881b8)
#1  0x00007fb8c88b6dc9 n/a (libsyslog-ng-3.35.so.0 + 0x82dc9)
#2  0x00007fb8c8869132 afinter_message_posted (libsyslog-ng-3.35.so.0 + 0x35132)
#3  0x00007fb8c887e7d7 msg_event_suppress_recursions_and_send (libsyslog-ng-3.35.so.0 + 0x4a7d7)
#4  0x00007fb8c7816098 n/a (libdisk-buffer.so + 0xb098)
#5  0x00007fb8c8878f62 n/a (libsyslog-ng-3.35.so.0 + 0x44f62)
#6  0x00007fb8c886e6da n/a (libsyslog-ng-3.35.so.0 + 0x3a6da)
#7  0x00007fb8c88738ea n/a (libsyslog-ng-3.35.so.0 + 0x3f8ea)
#8  0x00007fb8c8873ac5 n/a (libsyslog-ng-3.35.so.0 + 0x3fac5)
#9  0x00007fb8c88738ea n/a (libsyslog-ng-3.35.so.0 + 0x3f8ea)
#10 0x00007fb8c8873ac5 n/a (libsyslog-ng-3.35.so.0 + 0x3fac5)
#11 0x00007fb8c88738ea n/a (libsyslog-ng-3.35.so.0 + 0x3f8ea)
#12 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#13 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#14 0x00007fb8c8873ac5 n/a (libsyslog-ng-3.35.so.0 + 0x3fac5)
#15 0x00007fb8c88738ea n/a (libsyslog-ng-3.35.so.0 + 0x3f8ea)
#16 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#17 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#18 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#19 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#20 0x00007fb8c887399c n/a (libsyslog-ng-3.35.so.0 + 0x3f99c)
#21 0x00007fb8c8873ac5 n/a (libsyslog-ng-3.35.so.0 + 0x3fac5)
#22 0x00007fb8c88738ea n/a (libsyslog-ng-3.35.so.0 + 0x3f8ea)
#23 0x00007fb8c8873ac5 n/a (libsyslog-ng-3.35.so.0 + 0x3fac5)
#24 0x00007fb8c886e6da n/a (libsyslog-ng-3.35.so.0 + 0x3a6da)
#25 0x00007fb8c886e78c n/a (libsyslog-ng-3.35.so.0 + 0x3a78c)
#26 0x00007fb8c886e78c n/a (libsyslog-ng-3.35.so.0 + 0x3a78c)
#27 0x00007fb8c886e78c n/a (libsyslog-ng-3.35.so.0 + 0x3a78c)
#28 0x00007fb8c8873e3a n/a (libsyslog-ng-3.35.so.0 + 0x3fe3a)
#29 0x00007fb8c887773a n/a (libsyslog-ng-3.35.so.0 + 0x4373a)
#30 0x00007fb8c8873e3a n/a (libsyslog-ng-3.35.so.0 + 0x3fe3a)
#31 0x00007fb8c8876f95 log_source_post (libsyslog-ng-3.35.so.0 + 0x42f95)
#32 0x00007fb8c7a8db40 n/a (libsdjournal.so + 0x7b40)
#33 0x00007fb8c7a8dd6d n/a (libsdjournal.so + 0x7d6d)
#34 0x00007fb8c887d9e5 n/a (libsyslog-ng-3.35.so.0 + 0x499e5)
#35 0x00007fb8c86578fa n/a (libivykis.so.0 + 0x58fa)
#36 0x00007fb8c86562a3 n/a (libivykis.so.0 + 0x42a3)
#37 0x00007fb8c865c3e5 n/a (libivykis.so.0 + 0xa3e5)
#38 0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#39 0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#40 0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#41 0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#42 0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#43 0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 654:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c887ffad main_loop_run (libsyslog-ng-3.35.so.0 + 0x4bfad)
#5  0x00005576042cc844 main (syslog-ng + 0x2844)
#6  0x00007fb8c8429d90 n/a (libc.so.6 + 0x29d90)
#7  0x00007fb8c8429e40 __libc_start_main (libc.so.6 + 0x29e40)
#8  0x00005576042cc9d5 _start (syslog-ng + 0x29d5)
Stack trace of thread 884:
#0  0x00007fb8c8491398 __lll_lock_wake_private (libc.so.6 + 0x91398)
#1  0x00007fb8c84a572d n/a (libc.so.6 + 0xa572d)
#2  0x00007fb8c84949cf n/a (libc.so.6 + 0x949cf)
#3  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 887:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 888:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 890:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 1005:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 1001:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 1002:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 1003:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
Stack trace of thread 1004:
#0  0x00007fb8c8525fde epoll_wait (libc.so.6 + 0x125fde)
#1  0x00007fb8c865c1f7 n/a (libivykis.so.0 + 0xa1f7)
#2  0x00007fb8c8658531 n/a (libivykis.so.0 + 0x6531)
#3  0x00007fb8c8659baa iv_main (libivykis.so.0 + 0x7baa)
#4  0x00007fb8c86576c7 n/a (libivykis.so.0 + 0x56c7)
#5  0x00007fb8c865aef9 n/a (libivykis.so.0 + 0x8ef9)
#6  0x00007fb8c8494b43 n/a (libc.so.6 + 0x94b43)
#7  0x00007fb8c8526a00 n/a (libc.so.6 + 0x126a00)
download_instructions=gsutil cp gs://upload.scylladb.com/core.syslog-ng.0.0f924711a8d9473b9575d9a077a8e8ab.654.1685459056000000/core.syslog-ng.0.0f924711a8d9473b9575d9a077a8e8ab.654.1685459056000000.gz .
gunzip /var/lib/systemd/coredump/core.syslog-ng.0.0f924711a8d9473b9575d9a077a8e8ab.654.1685459056000000.gz

Note that there was no stress load at that time: Screenshot from 2023-06-02 15-04-04

Monitoring:

Logs:

Jenkins job URL Argus

vponomaryov commented 1 year ago

At first, the issue was filed here: https://github.com/scylladb/scylladb/issues/14120 Moving it here for further investigation from the QA side. As a first step, triggered the ReRun to see it's repeatability.

vponomaryov commented 1 year ago

The ReRun has successfully passed: https://jenkins.scylladb.com/job/scylla-5.2/job/longevity/job/longevity-10gb-3h-gce-test/19/console