Open aleksbykov opened 2 hours ago
@kbr-scylla @patjed41 , this but is not related to scylla directly ( at least i didn't find any issue in scylla logs) but scylla manager repair task failed and looks like it could be related to zero token nodes
Could be that support for zero-token nodes needs to be explicitly implemented in Scylla Manager. Maybe it assumes that every node has tokens
Packages
Scylla version:
6.2.0-20241013.b8a9fd4e49e8
with build-ida61f658b0408ba10663812f7a3b4d6aea7714fac
Kernel Version:
6.8.0-1016-aws
Scylla Manager Agent 3.3.3-0.20240912.924034e0dIssue description
Cluster configured with zero token nodes and multi dc configuration. There are DC: "eu-west-1" with 3 data nodes, DC: "eu-west-2": 3 data nodes and 1 zero token nodes, DC: "eu-north-1": 1 zero token node.
Nemesis 'disrupt_mgmt_corrupt_then_repair' was failed. This nemesis stops scylla , remove several sstables, start scylla and then trigger repair from scylla manager. Nemesis chose node4 (data node) as target node. It remove sstables after scylla was stopped. And after scylla was started triggered repair from scylla manager: Repair task was failed after an hour:
Next error found in scylla manager log in "monitor-set-2bc4de73.tar.gz":
This could be related to zero token nodes in cofiguration.
Impact
Repair process failed from scylla manager.
Installation details
Cluster size: 6 nodes (i4i.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-01f5cd2cb7c8dbd6f ami-0a32db7034cf41d95 ami-0b2b4e9fba26c7618
(aws: undefined_region)Test:
longevity-multi-dc-rack-aware-zero-token-dc
Test id:2bc4de73-4328-4444-b601-6bd88060fa4d
Test name:scylla-staging/abykov/longevity-multi-dc-rack-aware-zero-token-dc
Test method:longevity_test.LongevityTest.test_custom_time
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 2bc4de73-4328-4444-b601-6bd88060fa4d` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=2bc4de73-4328-4444-b601-6bd88060fa4d) - Show all stored logs command: `$ hydra investigate show-logs 2bc4de73-4328-4444-b601-6bd88060fa4d` ## Logs: - **db-cluster-2bc4de73.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/db-cluster-2bc4de73.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/db-cluster-2bc4de73.tar.gz) - **sct-runner-events-2bc4de73.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/sct-runner-events-2bc4de73.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/sct-runner-events-2bc4de73.tar.gz) - **sct-2bc4de73.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/sct-2bc4de73.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/sct-2bc4de73.log.tar.gz) - **loader-set-2bc4de73.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/loader-set-2bc4de73.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/loader-set-2bc4de73.tar.gz) - **monitor-set-2bc4de73.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/monitor-set-2bc4de73.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2bc4de73-4328-4444-b601-6bd88060fa4d/20241017_044804/monitor-set-2bc4de73.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-multi-dc-rack-aware-zero-token-dc/26/) [Argus](https://argus.scylladb.com/test/bbd702fb-2f87-4b0b-a068-c2c83d74cb77/runs?additionalRuns[]=2bc4de73-4328-4444-b601-6bd88060fa4d)