Open timtimb0t opened 1 day ago
@timtimb0t
he logs like you referred to are irrelevant, it a constant error we get since node doesn't have AWS credentials, and it's o.k.
this is the relevant information:
2024-11-09 08:48:25.137: (DisruptionEvent Severity.ERROR) period_type=end event_id=0599554f-90cb-4cca-be75-25060b00ec34 duration=3h23m27s: nemesis_name=CreateIndex target_node=Node longevity-twcs-48h-master-db-node-b7272755-2 [63.35.171.254 | 10.4.10.199] errors=errors={'10.4.11.51:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.4.11.51:9042
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5354, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4999, in disrupt_create_index
drop_index(session, ks, index_name)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/nemesis_utils/indexes.py", line 116, in drop_index
session.execute(SimpleStatement(f'DROP INDEX {ks}.{index_name}'), timeout=300)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1318, in execute_verbose
return execute_orig(*args, **kwargs)
File "cassandra/cluster.py", line 2729, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 5120, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'10.4.11.51:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.4.11.51:9042
the nemesis is running for more than 3h: 2024-11-09 05:24:57 2024-11-09 08:48:25
and the load during it dropped like crazy:
there are multiple reports of that in scylla issue: https://github.com/scylladb/scylladb/issues/16661
i.e. sound like it's not something new, and not sure SCT can do anything about it.
Packages
Scylla version:
6.3.0~dev-20241108.aebb5329068e
with build-idf25ba153fbf85f1e556539e48f980dd93e3ab285
Kernel Version:
6.8.0-1018-aws
Issue description
New issue
Not sure what the root cause of this problem is, but right before the nemesis failure such an error appeared:
Impact
No implicit impact on scylla, seems to be SCT case
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Cluster size: 4 nodes (i3en.2xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-07f847bea92dccb9a
(aws: undefined_region)Test:
longevity-twcs-48h-test
Test id:b7272755-2d70-4e84-8a05-7cb0559db73d
Test name:scylla-master/tier1/longevity-twcs-48h-test
Test method:longevity_twcs_test.TWCSLongevityTest.test_custom_time
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor b7272755-2d70-4e84-8a05-7cb0559db73d` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=b7272755-2d70-4e84-8a05-7cb0559db73d) - Show all stored logs command: `$ hydra investigate show-logs b7272755-2d70-4e84-8a05-7cb0559db73d` ## Logs: - **longevity-twcs-48h-master-db-node-b7272755-1** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241109_040647/longevity-twcs-48h-master-db-node-b7272755-1-b7272755.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241109_040647/longevity-twcs-48h-master-db-node-b7272755-1-b7272755.tar.gz) - **longevity-twcs-48h-master-db-node-b7272755-2** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241109_040647/longevity-twcs-48h-master-db-node-b7272755-2-b7272755.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241109_040647/longevity-twcs-48h-master-db-node-b7272755-2-b7272755.tar.gz) - **db-cluster-b7272755.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/db-cluster-b7272755.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/db-cluster-b7272755.tar.gz) - **sct-runner-events-b7272755.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/sct-runner-events-b7272755.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/sct-runner-events-b7272755.tar.gz) - **sct-b7272755.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/sct-b7272755.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/sct-b7272755.log.tar.gz) - **loader-set-b7272755.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/loader-set-b7272755.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/loader-set-b7272755.tar.gz) - **monitor-set-b7272755.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/monitor-set-b7272755.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b7272755-2d70-4e84-8a05-7cb0559db73d/20241110_041622/monitor-set-b7272755.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-master/job/tier1/job/longevity-twcs-48h-test/43/) [Argus](https://argus.scylladb.com/test/ecd497c0-82d6-4269-b053-f5c2157e04ae/runs?additionalRuns[]=b7272755-2d70-4e84-8a05-7cb0559db73d)