Open fruch opened 1 year ago
We will need to adjust all places we install things on the loader. We can maybe progress to CentOS8 more easily if you think it's relevant to this issue.
We will need to adjust all places we install things on the loader. We can maybe progress to CentOS8 more easily if you think it's relevant to this issue.
why not move to some more stable distro ? (for same reason we did for scylla images ? also for docker based loader, we do care just for docker to be installed, nothing else.
Just to avoid catching all places we assumed it's centos. But if it should be simple with docker backend it should be easier. Only thing left is what about future performance branches.
Just to avoid catching all places we assumed it's centos. But if it should be simple with docker backend it should be easier. Only thing left is what about future performance branches.
I don't think we have any such assumption for the loader, and it would be easy to flush out.
The perf branches can "calibrated" with Ubuntu+docker, or keep the old CentOS7 images
Anyhow lets wait for more incidents of this, before deciding we should more in any of the directions
Anyhow lets wait for more incidents of this, before deciding we should more in any of the directions
The only downside of waiting, is that is might affect next release (5.2), so far it's two week we have it in, and we only saw this once.
Another reproduction in run:
Kernel Version: 5.15.0-1028-aws
Scylla version (or git commit hash): 5.3.0~dev-20230131.5d914adcef1f
with build-id c0fd94703025292798832fd91f1b88ffe64025d7
Cluster size: 3 nodes (i3.large)
Scylla Nodes used in this run:
OS / Image: ami-07a90f071421efaed
(aws: us-east-1)
Test: gemini-3h-with-nemesis-test
Test id: 08d92363-4120-4119-9410-ecfcc25d4739
Test name: scylla-staging/lukasz/gemini-3h-with-nemesis-test
Test config file(s):
I'm guessing kind of related:
docker service, coredumped:
Mar 27 22:50:25 gemini-with-nemesis-3h-normal-maste-loader-node-59828a1f-1 systemd-coredump[24575]: Failed to create coredump file /var/lib/systemd/coredump/.#core.dockerd.0.f407a0f9fb9244698afc85d58475c463.1364.167995742400000010c8b3daaeed3160: No such file or directory
Mar 27 22:50:25 gemini-with-nemesis-3h-normal-maste-loader-node-59828a1f-1 systemd-coredump[24575]: Process 1364 (dockerd) of user 0 dumped core.
Mar 27 22:50:25 gemini-with-nemesis-3h-normal-maste-loader-node-59828a1f-1 systemd[1]: docker.service: main process exited, code=killed, status=11/SEGV
we should change the loader AMIs to newer distro, and newer docker versions.
Describe your issue in detail and steps it took to produce it.
Describe the impact this issue causes to the user.
Describe the frequency with how this issue can be reproduced.
Kernel Version: 5.15.0-1031-aws
Scylla version (or git commit hash): 5.3.0~dev-20230325.e8fb718e4ad4
with build-id 6eed28a1ac2addc02aceea60af4d6ee4acd56955
Cluster size: 3 nodes (i3.large)
Scylla Nodes used in this run:
OS / Image: ami-04226bde2b30a3d2d
(aws: eu-west-1)
Test: gemini-3h-with-nemesis-test
Test id: 59828a1f-6c16-47cd-a63b-9ceb43a0c844
Test name: scylla-master/gemini-/gemini-3h-with-nemesis-test
Test config file(s):
Issue description
Installation details
Kernel Version: 5.15.0-1026-aws Scylla version (or git commit hash):
5.1.0-20221201.fde4a6e92d83
with build-id3f51aa5a5121e5f42755a4cc669ae3dfc2e3b2dd
Relocatable Package: http://downloads.scylladb.com/unstable/scylla/branch-5.1/relocatable/2022-12-01T02:29:23Z/scylla-x86_64-package.tar.gz Cluster size: 3 nodes (i3.large)Scylla Nodes used in this run:
OS / Image:
ami-00c21f4517ae026a4
(aws: us-east-1)Test:
gemini-3h-with-nemesis-test
Test id:9046ca31-2d20-4cdc-9c16-993a8561b282
Test name:scylla-5.1/gemini-/gemini-3h-with-nemesis-test
Test config file(s):gemini-3h-with-nemesis.yaml
Restore Monitor Stack command:
$ hydra investigate show-monitor 9046ca31-2d20-4cdc-9c16-993a8561b282
Restore monitor on AWS instance using Jenkins job
Show all stored logs command:
$ hydra investigate show-logs 9046ca31-2d20-4cdc-9c16-993a8561b282
Logs:
Jenkins job URL