Similar issue got in the longevity-multi-keyspaces-60h-test running on the branch 4.5.rc5
Cassandra-stress exception got during rebuild (CorruptThenRebuild nemesis)
Cassandra-stress failes (failed to write/read the keys) started at 2021-07-15 23:51:52:
< t:2021-07-15 23:51:52,922 f:base.py l:220 c:RemoteCmdRunner p:DEBUG > java.io.IOException: Operation x10 on key(s) [36394f4c313538503131]: Error executing: (WriteFailureException): Cassandra failure during write query at consistency QUORUM (2 responses were required but only 1 replica responded, 1 failed)
< t:2021-07-15 23:52:37,350 f:file_logger.py l:80 c:sdcm.sct_events.file_logger p:INFO > java.io.IOException: Operation x10 on key(s) [36394f4c313538503131]: Error executing: (WriteFailureException): Cassandra failure during write query at consistency QUORUM (2 responses were required but only 1 replica responded, 1 failed)
< t:2021-07-16 00:10:46,628 f:base.py l:220 c:RemoteCmdRunner p:DEBUG > java.io.IOException: Operation x10 on key(s) [365033374e3636333630]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
< t:2021-07-16 00:10:53,433 f:file_logger.py l:80 c:sdcm.sct_events.file_logger p:INFO > com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout durinjava.io.IOException: Operation x10 on key(s) [365033374e3636333630]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
< t:2021-07-16 00:12:54,263 f:base.py l:220 c:RemoteCmdRunner p:DEBUG > java.io.IOException: Operation x10 on key(s) [50354c343631374f3331]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
< t:2021-07-16 00:12:55,664 f:file_logger.py l:80 c:sdcm.sct_events.file_logger p:INFO > java.io.IOException: Operation x10 on key(s) [50354c343631374f3331]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
< t:2021-07-16 00:49:38,067 f:base.py l:220 c:RemoteCmdRunner p:DEBUG > java.io.IOException: Operation x10 on key(s) [3532373938334b4c3631]: Error executing: (WriteFailureException): Cassandra failure during write query at consistency QUORUM (2 responses were required but only 1 replica responded, 1 failed)
< t:2021-07-16 00:49:38,485 f:file_logger.py l:80 c:sdcm.sct_events.file_logger p:INFO > com.datastax.driver.core.exceptions.ReadTimejava.io.IOException: Operation x10 on key(s) [3532373938334b4c3631]: Error executing: (WriteFailureException): Cassandra failure during write query at consistency QUORUM (2 responses were required but only 1 replica responded, 1 failed)
< t:2021-07-16 00:50:25,817 f:base.py l:220 c:RemoteCmdRunner p:DEBUG > java.io.IOException: Operation x10 on key(s) [4f3530394e3533373430]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
< t:2021-07-16 00:50:37,092 f:file_logger.py l:80 c:sdcm.sct_events.file_logger p:INFO > com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica respondjava.io.IOException: Operation x10 on key(s) [4f3530394e3533373430]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
All missed keys were found on the 3 nodes (RF=3)
The node is marked by green - node that runs rebuild
Similar issue got in the
longevity-multi-keyspaces-60h-test
running on the branch4.5.rc5
Cassandra-stress exception got during rebuild (CorruptThenRebuild
nemesis)After Scylla restart (stop and start):
A few sstables were deleted. Rebuild was started at
2021-07-15 16:11:40
:Failed to apply mutation (named_semaphore_timed_out) started at
2021-07-15 18:07:33
Cassandra-stress failes (failed to write/read the keys) started at
2021-07-15 23:51:52
:All missed keys were found on the 3 nodes (RF=3)
The node is marked by green - node that runs rebuild
Live monitor
Installation details Kernel version:
5.4.0-1035-aws
Scylla version (or git commit hash):4.5.rc5-0.20210711.7f96719c5
Cluster size: 6 nodes (i3.8xlarge) Scylla running with shards number (live nodes): longevity-1000-keyspaces-4-5-db-node-3a9de768-1 (13.51.167.63 | 10.0.2.120): 30 shards longevity-1000-keyspaces-4-5-db-node-3a9de768-2 (13.51.69.18 | 10.0.3.247): 30 shards longevity-1000-keyspaces-4-5-db-node-3a9de768-3 (13.48.71.98 | 10.0.3.238): 30 shards longevity-1000-keyspaces-4-5-db-node-3a9de768-4 (13.53.38.98 | 10.0.0.10): 30 shards longevity-1000-keyspaces-4-5-db-node-3a9de768-5 (13.51.193.145 | 10.0.1.242): 30 shards longevity-1000-keyspaces-4-5-db-node-3a9de768-6 (13.48.45.37 | 10.0.1.146): 30 shards OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-09a138d84af5dd9e9
(aws: eu-north-1)Test:
longevity-multi-keyspaces-60h-test
Test name:longevity_test.LongevityTest.test_batch_custom_time
Test config file(s):Restore Monitor Stack command:
$ hydra investigate show-monitor 3a9de768-c86f-4b82-aafd-b0f9d35085dd
Show all stored logs command:$ hydra investigate show-logs 3a9de768-c86f-4b82-aafd-b0f9d35085dd
Test id:
3a9de768-c86f-4b82-aafd-b0f9d35085dd
Logs: db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/3a9de768-c86f-4b82-aafd-b0f9d35085dd/20210717_033618/db-cluster-3a9de768.tar.gz loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/3a9de768-c86f-4b82-aafd-b0f9d35085dd/20210717_033618/loader-set-3a9de768.tar.gz monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/3a9de768-c86f-4b82-aafd-b0f9d35085dd/20210717_033618/monitor-set-3a9de768.tar.gz sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/3a9de768-c86f-4b82-aafd-b0f9d35085dd/20210717_033618/sct-runner-3a9de768.tar.gz
Jenkins job URL