Closed roydahan closed 6 months ago
This issue is stale because it has been open 2 years with no activity. Remove stale label or comment or this will be closed in 2 days.
This issue was closed because it has been stalled for 2 days with no activity.
@vponomaryov commented on Thu Dec 16 2021
Installation details Kernel version:
5.11.0-1022-aws
Scylla version (or git commit hash):4.7.dev-0.20211215.3ac622bdd with build-id c19c7740f10f82f554c1b67da69f82fc48f9386f
Cluster size: 3 nodes (i3.large) OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-045cd00d68f3af143
(eu-north-1)Scylla running with shards number (live nodes): gemini-with-nemesis-3h-normal-maste-db-node-a9ab70d5-1 (13.48.5.235 | 10.0.3.26): 2 shards gemini-with-nemesis-3h-normal-maste-db-node-a9ab70d5-2 (16.170.237.202 | 10.0.2.183): 2 shards gemini-with-nemesis-3h-normal-maste-db-node-a9ab70d5-4 (13.51.156.207 | 10.0.2.151): 2 shards
Dead Scylla nodes: gemini-with-nemesis-3h-normal-maste-db-node-a9ab70d5-3 (13.48.190.179 | 10.0.0.129): 2 shards, terminated at 2021-12-15 11:47:05.662660
Oracle AMI: ami-0118e6d3cacc225b4 Oracle instance type: i3.8xlarge Oracle nodes: gemini-with-nemesis-3h-normal-maste-oracle-db-node-a9ab70d5-1 (13.51.233.71 | 10.0.1.23)
Test:
gemini-3h-with-nemesis-test
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
StopWaitStartScyllaServer
nemesis stops scylla, sleeps for 5 minutes and then starts it. It failed with the following error:If we look at the node 4, where nemesis took place, we see following:
So, the reshape of 63Gb took more than the timeout - 502s > 500s. And then, as a result, we get discrepancy between test and oracle clusters:
Restore Monitor Stack command:
$ hydra investigate show-monitor a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c
Restore monitor on AWS instance using Jenkins job Show all stored logs command:$ hydra investigate show-logs a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c
Test id:
a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c
Logs: db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c/20211215_124608/db-cluster-a9ab70d5.tar.gz loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c/20211215_124608/loader-set-a9ab70d5.tar.gz monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c/20211215_124608/monitor-set-a9ab70d5.tar.gz sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/a9ab70d5-f7f1-4c1b-8f26-cd5700843d1c/20211215_124608/sct-runner-a9ab70d5.tar.gz
Jenkins job URL
@slivne commented on Mon Dec 20 2021
@roydahan not sure this is not a test issue