Open yarongilor opened 2 months ago
@fruch do you have any idea if SCT already bumped similar issues? or is there already any suggested improvement?
@fruch do you have any idea if SCT already bumped similar issues? or is there already any suggested improvement?
Are you sure of the order of things ?
Test isn't supposed to end before stress commands are finished.
If it stopped with timeout of the test, something isn't working as expected, or stress took longer then it was asked to run, or test timeout is too small.
If stress is running during teardown, it's also not a reason for nodes to be gone
You are completely barking at the wrong tree, that an abort during the test during a nemesis that changes topology.
You clearly lost quorum, and SCT has nothing to do about it.
This is not an issue with SCT,
Gemini shows its failure once it finishes, it has nothing to do if it's during teardown or not.
DB nodes are not stopped on teardown
looking at it again, one node is lost in disrupt_remove_node_then_add_node
, and wasn't replaced, cause of failure in removenode
and then one more node stopped during enospc nemesis
this case has only 3 nodes, and two are gone, guess what gemini would fail....
Packages
Scylla version:
6.0.3-20240808.a56f7ce21ad4
with build-id00ad3169bb53c452cf2ab93d97785dc56117ac3e
Kernel Version:
5.15.0-1067-aws
Issue description
Describe your issue in detail and steps it took to produce it.
Perhaps It would be best if Teardown first update other thread or stop it, in order to avoid such collisions.
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Cluster size: 3 nodes (i4i.2xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0c6a6957b89f8504f
(aws: undefined_region)Test:
gemini-3h-with-nemesis-test
Test id:5d11f833-59fd-4573-ba63-afec8d1b175b
Test name:scylla-6.0/gemini/gemini-3h-with-nemesis-test
Test method:gemini_test.GeminiTest.test_load_random_with_nemesis
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 5d11f833-59fd-4573-ba63-afec8d1b175b` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=5d11f833-59fd-4573-ba63-afec8d1b175b) - Show all stored logs command: `$ hydra investigate show-logs 5d11f833-59fd-4573-ba63-afec8d1b175b` ## Logs: - **db-cluster-5d11f833.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/db-cluster-5d11f833.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/db-cluster-5d11f833.tar.gz) - **sct-runner-events-5d11f833.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/sct-runner-events-5d11f833.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/sct-runner-events-5d11f833.tar.gz) - **sct-5d11f833.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/sct-5d11f833.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/sct-5d11f833.log.tar.gz) - **loader-set-5d11f833.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/loader-set-5d11f833.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/loader-set-5d11f833.tar.gz) - **monitor-set-5d11f833.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/monitor-set-5d11f833.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/monitor-set-5d11f833.tar.gz) - **parallel-timelines-report-5d11f833.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/parallel-timelines-report-5d11f833.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/5d11f833-59fd-4573-ba63-afec8d1b175b/20240811_154920/parallel-timelines-report-5d11f833.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-6.0/job/gemini/job/gemini-3h-with-nemesis-test/16/) [Argus](https://argus.scylladb.com/test/2873b203-404e-492c-957e-8cf49830f0f5/runs?additionalRuns[]=5d11f833-59fd-4573-ba63-afec8d1b175b)