Open ilya-rarov opened 1 year ago
From the settings of failed c-s thread
Node:
Nodes: [10.12.11.235, 10.12.11.13, 10.12.11.209, 10.12.10.64, 10.12.9.147, 10.12.10.222]
Is White List: false
Datacenter: null
Schema:
Keyspace: drop_table_during_repair_ks_2
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Strategy Options: {replication_factor=6}
Table Compression: null
Table Compaction Strategy: null
Table Compaction Strategy Options: {}
We have 6-node cluster and RF=6. I guess, @roydahan, you suggested to change the RF, right?
So https://github.com/scylladb/scylla-cluster-tests/pull/5422 workaround didn't really wait correctly to confirm the creation of keyspace ?
I think there are few scylla issues around slow keyspace propagation, I think we should report it there.
Seems like #5422 didn't solve the issue, because it happened within the prepare (the time between creating one ks,cf till the time c-s gave up on it).
@ilya-rarov let's try to reproduce this by running the single nemesis job of it. If it reproduces, we will open a new scylla issue. If not, we may increase the load in this job until it reproduces.
Issue description
During the
disrupt_no_corrupt_repair nemesis
, we create keyspacesdrop_table_during_repair_ks_X
by triggering the corresponding c-s threads. The thread that was supposed to create the ksdrop_table_during_repair_ks_2
started throwing errors and failed the very beginningI checked in the logs of the nodes when exactly the creation of the keyspace was triggered
It looks like when c-s thread queried the node-9 (10.12.10.64) the keyspace had not existed there yet. The node started creating the keyspace only at 16:10:14 (the c-s had already failed by that time).
This case might correlate with https://github.com/scylladb/scylla-cluster-tests/issues/5413
How frequently does it reproduce?
I don't know.
Installation details
Kernel Version: 5.15.0-1031-aws Scylla version (or git commit hash):
2023.1.0~rc2-20230302.726f8a090337
with build-idf8b55007dc790d0c224d5e2d4b422786ffeb918e
Cluster size: 6 nodes (im4gn.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0734389f2aed090ad
(aws: us-east-1)Test:
longevity-50gb-3days-arm-test
Test id:1dfc319c-b33c-4b80-a705-e413ac1f0b1e
Test name:enterprise-2023.1/longevity/longevity-50gb-3days-arm-test
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 1dfc319c-b33c-4b80-a705-e413ac1f0b1e` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=1dfc319c-b33c-4b80-a705-e413ac1f0b1e) - Show all stored logs command: `$ hydra investigate show-logs 1dfc319c-b33c-4b80-a705-e413ac1f0b1e` ## Logs: - **db-cluster-1dfc319c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/db-cluster-1dfc319c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/db-cluster-1dfc319c.tar.gz) - **email_data-1dfc319c.json.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/email_data-1dfc319c.json.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/email_data-1dfc319c.json.tar.gz) - **output-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/output-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/output-1dfc319c.log.tar.gz) - **debug-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/debug-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/debug-1dfc319c.log.tar.gz) - **events-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/events-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/events-1dfc319c.log.tar.gz) - **sct-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/sct-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/sct-1dfc319c.log.tar.gz) - **normal-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/normal-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/normal-1dfc319c.log.tar.gz) - **argus-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/argus-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/argus-1dfc319c.log.tar.gz) - **raw_events-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/raw_events-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/raw_events-1dfc319c.log.tar.gz) - **critical-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/critical-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/critical-1dfc319c.log.tar.gz) - **warning-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/warning-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/warning-1dfc319c.log.tar.gz) - **summary-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/summary-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/summary-1dfc319c.log.tar.gz) - **left_processes-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/left_processes-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/left_processes-1dfc319c.log.tar.gz) - **error-1dfc319c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/error-1dfc319c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/error-1dfc319c.log.tar.gz) - **monitor-set-1dfc319c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/monitor-set-1dfc319c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/monitor-set-1dfc319c.tar.gz) - **loader-set-1dfc319c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/loader-set-1dfc319c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1dfc319c-b33c-4b80-a705-e413ac1f0b1e/20230317_175824/loader-set-1dfc319c.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-50gb-3days-arm-test/2/)