scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
56 stars 93 forks source link

Filter out "Error applying view update" error when a node is down as it is expected #6907

Open juliayakovlev opened 10 months ago

juliayakovlev commented 10 months ago

Issue description

Error applying view update error is expected when a node is down. In this case it was happens during disrupt_multiple_hard_reboot_node nemesis:

2023-12-04 10:19:32.928 <2023-12-04 10:19:09.617>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=bc1654e9-af3d-4ea5-a113-a374728ae233: type=DATABASE_ERROR regex=(^ERROR|!\s*?ERR).*\[shard.*\] line_number=6711 node=longevity-large-partitions-3h-2023--db-node-a022fb3f-1
2023-12-04T10:19:09.617+00:00 longevity-large-partitions-3h-2023--db-node-a022fb3f-1      !ERR | scylla[5413]:  [shard 1] view - Error applying view update to 10.12.2.101 (view: scylla_bench.si_text_index, base token: -6056332791263634639, view token: -8504585955434838350): exceptions::unavailable_exception (Cannot achieve consistency level for cl ONE. Requires 1, alive 0)

We need to filter out this message on reboot.

Kernel Version: 5.15.0-1050-aws Scylla version (or git commit hash): 2023.1.3-20231204.e79ea9835df7 with build-id e210e1f13f58542da05a9f2aca55ee89de1d1d64

Cluster size: 5 nodes (i3.2xlarge)

Scylla Nodes used in this run:

OS / Image: ami-03d838f35dcf5a090 (aws: us-east-1)

Test: longevity-large-partition-asymmetric-cluster-3h-test Test id: a022fb3f-e134-4405-a55a-dfd80e82bcd5 Test name: enterprise-2023.1/longevity/longevity-large-partition-asymmetric-cluster-3h-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor a022fb3f-e134-4405-a55a-dfd80e82bcd5` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=a022fb3f-e134-4405-a55a-dfd80e82bcd5) - Show all stored logs command: `$ hydra investigate show-logs a022fb3f-e134-4405-a55a-dfd80e82bcd5` ## Logs: - **db-cluster-a022fb3f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/db-cluster-a022fb3f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/db-cluster-a022fb3f.tar.gz) - **sct-runner-events-a022fb3f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/sct-runner-events-a022fb3f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/sct-runner-events-a022fb3f.tar.gz) - **sct-a022fb3f.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/sct-a022fb3f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/sct-a022fb3f.log.tar.gz) - **monitor-set-a022fb3f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/monitor-set-a022fb3f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/monitor-set-a022fb3f.tar.gz) - **loader-set-a022fb3f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/loader-set-a022fb3f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/loader-set-a022fb3f.tar.gz) - **parallel-timelines-report-a022fb3f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/parallel-timelines-report-a022fb3f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/a022fb3f-e134-4405-a55a-dfd80e82bcd5/20231204_135349/parallel-timelines-report-a022fb3f.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h-test/18/) [Argus](https://argus.scylladb.com/test/66353c83-2ec2-40e1-a9cf-27abf318ae52/runs?additionalRuns[]=a022fb3f-e134-4405-a55a-dfd80e82bcd5)
juliayakovlev commented 10 months ago

In time drain nemesis

2023-12-07 09:30:00.764 <2023-12-07 09:30:00.689>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2dc02ce7-6105-464f-919e-9a2f81a31c98: type=DATABASE_ERROR regex=(^ERROR|!\s*?ERR).*\[shard.*\] line_number=8117 node=longevity-parallel-topology-schema--db-node-b4619490-2
2023-12-07T09:30:00.689+00:00 longevity-parallel-topology-schema--db-node-b4619490-2      !ERR | scylla[5406]:  [shard 4] view - Error applying view update to 10.12.2.162 (view: keyspace1.standard1_c6_nemesis_index, base token: -9202783624745066043, view token: -6826110454302464931): exceptions::unavailable_exception (Cannot achieve consistency level for cl ONE. Requires 1, alive 0)

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Kernel Version: 5.15.0-1050-aws Scylla version (or git commit hash): 2023.1.3-20231204.e79ea9835df7 with build-id e210e1f13f58542da05a9f2aca55ee89de1d1d64

Cluster size: 5 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

OS / Image: ami-03d838f35dcf5a090 (aws: us-east-1)

Test: longevity-schema-topology-changes-12h-test Test id: b4619490-d05a-4a2d-ba66-2fa62895e00a Test name: enterprise-2023.1/longevity/longevity-schema-topology-changes-12h-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor b4619490-d05a-4a2d-ba66-2fa62895e00a` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=b4619490-d05a-4a2d-ba66-2fa62895e00a) - Show all stored logs command: `$ hydra investigate show-logs b4619490-d05a-4a2d-ba66-2fa62895e00a` ## Logs: - **db-cluster-b4619490.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/db-cluster-b4619490.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/db-cluster-b4619490.tar.gz) - **sct-runner-events-b4619490.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/sct-runner-events-b4619490.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/sct-runner-events-b4619490.tar.gz) - **sct-b4619490.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/sct-b4619490.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/sct-b4619490.log.tar.gz) - **monitor-set-b4619490.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/monitor-set-b4619490.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/monitor-set-b4619490.tar.gz) - **loader-set-b4619490.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/loader-set-b4619490.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b4619490-d05a-4a2d-ba66-2fa62895e00a/20231207_130836/loader-set-b4619490.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-schema-topology-changes-12h-test/7/) [Argus](https://argus.scylladb.com/test/079cddb4-4eac-4594-a2f0-8452dab5eb40/runs?additionalRuns[]=b4619490-d05a-4a2d-ba66-2fa62895e00a)