Looks like recent change #6442 is causing the following failure (since it's stopping nemesis thread while waiting for a NEMESIS_LOCK):
2023-08-13 02:47:48.904: (ThreadFailedEvent Severity.ERROR) period_type=one-time event_id=0348cf22-7a75-4268-a9ac-d9328194d214: message='140297676756048--disrupt_hard_reboot_node'
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4868, in wrapper
NEMESIS_LOCK.acquire() # pylint: disable=consider-using-with
sdcm.exceptions.KillNemesis
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_events/decorators.py", line 26, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 362, in run
self.disrupt()
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 6013, in disrupt
self.call_next_nemesis()
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1792, in call_next_nemesis
self.execute_disrupt_method(disrupt_method=self.disruptions_list.pop())
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1720, in execute_disrupt_method
disrupt_method()
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4970, in wrapper
NEMESIS_RUN_INFO.pop(nemesis_run_info_key)
KeyError: '140297676756048--disrupt_hard_reboot_node'
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Kernel Version: 5.10.184-175.749.amzn2.x86_64
Scylla version (or git commit hash): 5.4.0~dev-20230812.d1d1b6cf6e01 with build-id 6c4f55c26164d6fe2cd25d38f5022795ce696d9c
Scylla Nodes used in this run:
No resources left at the end of the run
OS / Image: `` (k8s-eks: undefined_region)
Test: longevity-scylla-operator-3h-multitenant-eks
Test id: 95e46710-c84c-48a2-9ef9-6366e2a664cf
Test name: scylla-operator/operator-master/eks/longevity-scylla-operator-3h-multitenant-eks
Test config file(s):
Issue description
Looks like recent change #6442 is causing the following failure (since it's stopping nemesis thread while waiting for a NEMESIS_LOCK):
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Kernel Version: 5.10.184-175.749.amzn2.x86_64 Scylla version (or git commit hash):
5.4.0~dev-20230812.d1d1b6cf6e01
with build-id6c4f55c26164d6fe2cd25d38f5022795ce696d9c
Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.10.0-alpha.0-28-gd131f8e Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run: No resources left at the end of the run
OS / Image: `` (k8s-eks: undefined_region)
Test:
longevity-scylla-operator-3h-multitenant-eks
Test id:95e46710-c84c-48a2-9ef9-6366e2a664cf
Test name:scylla-operator/operator-master/eks/longevity-scylla-operator-3h-multitenant-eks
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 95e46710-c84c-48a2-9ef9-6366e2a664cf` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=95e46710-c84c-48a2-9ef9-6366e2a664cf) - Show all stored logs command: `$ hydra investigate show-logs 95e46710-c84c-48a2-9ef9-6366e2a664cf` ## Logs: - **kubernetes-95e46710.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/kubernetes-95e46710.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/kubernetes-95e46710.tar.gz) - **db-cluster-95e46710.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/db-cluster-95e46710.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/db-cluster-95e46710.tar.gz) - **sct-runner-events-95e46710.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/sct-runner-events-95e46710.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/sct-runner-events-95e46710.tar.gz) - **sct-95e46710.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/sct-95e46710.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/sct-95e46710.log.tar.gz) - **loader-set-95e46710.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/loader-set-95e46710.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/loader-set-95e46710.tar.gz) - **monitor-set-95e46710.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/monitor-set-95e46710.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/monitor-set-95e46710.tar.gz) - **parallel-timelines-report-95e46710.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/parallel-timelines-report-95e46710.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/95e46710-c84c-48a2-9ef9-6366e2a664cf/20230813_073220/parallel-timelines-report-95e46710.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-operator/job/operator-master/job/eks/job/longevity-scylla-operator-3h-multitenant-eks/72/) [Argus](https://argus.scylladb.com/test/7774c24a-b749-4528-97a4-22785e7e5b6f/runs?additionalRuns[]=95e46710-c84c-48a2-9ef9-6366e2a664cf)