scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
55 stars 93 forks source link

dockerd crashed on loader during gemini run #5513

Open fruch opened 1 year ago

fruch commented 1 year ago

Issue description

Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime: gp: gp=0xc0006f4300, goid=0, gp->atomicstatus=0
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime:  g:  g=0xc000001380, goid=0,  g->atomicstatus=0
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: fatal error: bad g->status in ready
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime stack:
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.throw(0x55cf7aa9c446, 0x16)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/panic.go:774 +0x74
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.ready(0xc0006f4300, 0x4, 0xc000015e01)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:659 +0x2bd
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.goready.func1()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:315 +0x3a
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.systemstack(0x0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/asm_amd64.s:370 +0x63
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.mstart()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:1146
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: goroutine 36 [running]:
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.systemstack_switch()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc000015e08 sp=0xc000015e00 pc=0x55cf79201140
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.goready(0xc0006f4300, 0x4)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:314 +0x5e fp=0xc000015e38 sp=0xc000015e08 pc=0x55cf791d461e
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.send(0xc000094720, 0xc000094000, 0xc000015f38, 0xc000015ec8, 0x3)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/chan.go:299 +0x7e fp=0xc000015e68 sp=0xc000015e38 pc=0x55cf791a7fce
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.chansend(0xc000094720, 0xc000015f38, 0x63885c00, 0x55cf79269b0e, 0x6a76552ec12)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/chan.go:193 +0x51e fp=0xc000015ee8 sp=0xc000015e68 pc=0x55cf791a7e2e
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.selectnbsend(0xc000094720, 0xc000015f38, 0x55cf7cceb740)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/chan.go:615 +0x46 fp=0xc000015f20 sp=0xc000015ee8 pc=0x55cf791a8d56
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: time.sendTime(0x55cf7b622960, 0xc000094720, 0x0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/time/sleep.go:137 +0x6e fp=0xc000015f60 sp=0xc000015f20 pc=0x55cf79269b0e
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.timerproc(0x55cf7cceffe0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/time.go:297 +0x72 fp=0xc000015fd8 sp=0xc000015f60 pc=0x55cf791f1af2
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.goexit()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc000015fe0 sp=0xc000015fd8 pc=0x55cf79203241
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: created by runtime.(*timersBucket).addtimerLocked
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/time.go:169 +0x110
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: goroutine 1 [chan receive, 120 minutes]:
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: main.(*DaemonCli).start(0xc0007b4240, 0xc0000958c0, 0x0, 0x0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/daemon.go:253 +0xc03
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: main.runDaemon(...)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/docker_unix.go:13
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 systemd[1]: docker.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: main.newDaemonCommand.func1(0xc000730f00, 0xc0007b41e0, 0x0, 0x3, 0x0, 0x0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/docker.go:34 +0x7c
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute(0xc000730f00, 0xc0000d4010, 0x3, 0x3, 0xc000730f00, 0xc0000d4010)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:762 +0x462
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000730f00, 0x0, 0x0, 0x10)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:852 +0x2ec
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute(...)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:800
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: main.main()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/docker.go:97 +0x191
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: goroutine 19 [syscall, 91 minutes]:
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: os/signal.signal_recv(0x55cf7bb413a0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/sigqueue.go:147 +0x9e
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: os/signal.loop()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/os/signal/signal_unix.go:23 +0x24
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: created by os/signal.init.0
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/os/signal/signal_unix.go:29 +0x43
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: goroutine 0 [idle]:
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: fatal error: unexpected signal during runtime execution
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: panic during panic
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x55cf791f5e7a]
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime stack:
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.throw(0x55cf7aad6c78, 0x2a)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/panic.go:774 +0x74
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.sigpanic()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/signal_unix.go:378 +0x480
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc0006f4000, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/traceback.go:159 +0x15a
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc0006f4000, 0x0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/traceback.go:722 +0xf2
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc0006f4000)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/traceback.go:676 +0x54
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 systemd[1]: Unit docker.service entered failed state.
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.tracebackothers(0xc000001380)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/traceback.go:929 +0x1ac
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.dopanic_m(0xc000001380, 0x55cf791d27f4, 0x7f10903ebcb0, 0x1)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/panic.go:974 +0x2a4
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.fatalthrow.func1()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/panic.go:829 +0x61
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.fatalthrow()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/panic.go:826 +0x59
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.throw(0x55cf7aa9c446, 0x16)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/panic.go:774 +0x74
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.ready(0xc0006f4300, 0x4, 0xc000015e01)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:659 +0x2bd
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.goready.func1()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:315 +0x3a
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.systemstack(0x0)
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/asm_amd64.s:370 +0x63
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: runtime.mstart()
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 dockerd[1366]: /usr/local/go/src/runtime/proc.go:1146
Dec 01 07:50:26 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 systemd[1]: docker.service failed.
Dec 01 07:50:28 gemini-with-nemesis-3h-normal-5-1-loader-node-9046ca31-1 systemd[1]: docker.service holdoff time over, scheduling restart.

Installation details

Kernel Version: 5.15.0-1026-aws Scylla version (or git commit hash): 5.1.0-20221201.fde4a6e92d83 with build-id 3f51aa5a5121e5f42755a4cc669ae3dfc2e3b2dd Relocatable Package: http://downloads.scylladb.com/unstable/scylla/branch-5.1/relocatable/2022-12-01T02:29:23Z/scylla-x86_64-package.tar.gz Cluster size: 3 nodes (i3.large)

Scylla Nodes used in this run:

OS / Image: ami-00c21f4517ae026a4 (aws: us-east-1)

Test: gemini-3h-with-nemesis-test Test id: 9046ca31-2d20-4cdc-9c16-993a8561b282 Test name: scylla-5.1/gemini-/gemini-3h-with-nemesis-test Test config file(s):

Logs:

Jenkins job URL

fruch commented 1 year ago
roydahan commented 1 year ago

We will need to adjust all places we install things on the loader. We can maybe progress to CentOS8 more easily if you think it's relevant to this issue.

fruch commented 1 year ago

We will need to adjust all places we install things on the loader. We can maybe progress to CentOS8 more easily if you think it's relevant to this issue.

why not move to some more stable distro ? (for same reason we did for scylla images ? also for docker based loader, we do care just for docker to be installed, nothing else.

roydahan commented 1 year ago

Just to avoid catching all places we assumed it's centos. But if it should be simple with docker backend it should be easier. Only thing left is what about future performance branches.

fruch commented 1 year ago

Just to avoid catching all places we assumed it's centos. But if it should be simple with docker backend it should be easier. Only thing left is what about future performance branches.

I don't think we have any such assumption for the loader, and it would be easy to flush out.

The perf branches can "calibrated" with Ubuntu+docker, or keep the old CentOS7 images

fruch commented 1 year ago

Anyhow lets wait for more incidents of this, before deciding we should more in any of the directions

fruch commented 1 year ago

Anyhow lets wait for more incidents of this, before deciding we should more in any of the directions

The only downside of waiting, is that is might affect next release (5.2), so far it's two week we have it in, and we only saw this once.

soyacz commented 1 year ago

Another reproduction in run:

Installation details

Kernel Version: 5.15.0-1028-aws Scylla version (or git commit hash): 5.3.0~dev-20230131.5d914adcef1f with build-id c0fd94703025292798832fd91f1b88ffe64025d7

Cluster size: 3 nodes (i3.large)

Scylla Nodes used in this run:

OS / Image: ami-07a90f071421efaed (aws: us-east-1)

Test: gemini-3h-with-nemesis-test Test id: 08d92363-4120-4119-9410-ecfcc25d4739 Test name: scylla-staging/lukasz/gemini-3h-with-nemesis-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 08d92363-4120-4119-9410-ecfcc25d4739` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=08d92363-4120-4119-9410-ecfcc25d4739) - Show all stored logs command: `$ hydra investigate show-logs 08d92363-4120-4119-9410-ecfcc25d4739` ## Logs: - **db-cluster-08d92363.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/db-cluster-08d92363.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/db-cluster-08d92363.tar.gz) - **sct-runner-08d92363.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/sct-runner-08d92363.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/sct-runner-08d92363.tar.gz) - **monitor-set-08d92363.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/monitor-set-08d92363.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/monitor-set-08d92363.tar.gz) - **loader-set-08d92363.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/loader-set-08d92363.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/loader-set-08d92363.tar.gz) - **parallel-timelines-report-08d92363.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/parallel-timelines-report-08d92363.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/08d92363-4120-4119-9410-ecfcc25d4739/20230202_123655/parallel-timelines-report-08d92363.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/lukasz/job/gemini-3h-with-nemesis-test/5/)
fruch commented 1 year ago

I'm guessing kind of related:

docker service, coredumped:

Mar 27 22:50:25 gemini-with-nemesis-3h-normal-maste-loader-node-59828a1f-1 systemd-coredump[24575]: Failed to create coredump file /var/lib/systemd/coredump/.#core.dockerd.0.f407a0f9fb9244698afc85d58475c463.1364.167995742400000010c8b3daaeed3160: No such file or directory
Mar 27 22:50:25 gemini-with-nemesis-3h-normal-maste-loader-node-59828a1f-1 systemd-coredump[24575]: Process 1364 (dockerd) of user 0 dumped core.
Mar 27 22:50:25 gemini-with-nemesis-3h-normal-maste-loader-node-59828a1f-1 systemd[1]: docker.service: main process exited, code=killed, status=11/SEGV

we should change the loader AMIs to newer distro, and newer docker versions.

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Kernel Version: 5.15.0-1031-aws Scylla version (or git commit hash): 5.3.0~dev-20230325.e8fb718e4ad4 with build-id 6eed28a1ac2addc02aceea60af4d6ee4acd56955

Cluster size: 3 nodes (i3.large)

Scylla Nodes used in this run:

OS / Image: ami-04226bde2b30a3d2d (aws: eu-west-1)

Test: gemini-3h-with-nemesis-test Test id: 59828a1f-6c16-47cd-a63b-9ceb43a0c844 Test name: scylla-master/gemini-/gemini-3h-with-nemesis-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 59828a1f-6c16-47cd-a63b-9ceb43a0c844` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=59828a1f-6c16-47cd-a63b-9ceb43a0c844) - Show all stored logs command: `$ hydra investigate show-logs 59828a1f-6c16-47cd-a63b-9ceb43a0c844` ## Logs: - **db-cluster-59828a1f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/db-cluster-59828a1f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/db-cluster-59828a1f.tar.gz) - **sct-runner-events-59828a1f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/sct-runner-events-59828a1f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/sct-runner-events-59828a1f.tar.gz) - **sct-59828a1f.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/sct-59828a1f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/sct-59828a1f.log.tar.gz) - **monitor-set-59828a1f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/monitor-set-59828a1f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/monitor-set-59828a1f.tar.gz) - **loader-set-59828a1f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/loader-set-59828a1f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/loader-set-59828a1f.tar.gz) - **parallel-timelines-report-59828a1f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/parallel-timelines-report-59828a1f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/59828a1f-6c16-47cd-a63b-9ceb43a0c844/20230327_232636/parallel-timelines-report-59828a1f.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-master/job/gemini-/job/gemini-3h-with-nemesis-test/366/)