Open fgelcer opened 2 years ago
Is this the same as issue #9812? And/or #10257?
Is this the same as issue #9812? And/or #10257?
could be, but @slivne asked to split into a new issue here
for reference, this job, running with 4.5.5
(ami-0276e2bbc13093d95
on eu-north-1) passed without similar issue.
also, there are previous 4.5 builds that passed...
Reproduced in 5.0 :
Kernel Version: 5.13.0-1021-aws
Scylla version (or git commit hash): 5.0~rc3-20220406.f92622e0d
with build-id 2b79c4744216b294fdbd2f277940044c899156ea
Cluster size: 4 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-07835983d717b1ea3
(aws: eu-west-1)
Test: longevity-alternator-200gb-48h-test
Test id: d918f269-fa5e-4057-b74f-88062e9d5d0e
Test name: scylla-5.0/longevity/longevity-alternator-200gb-48h-test
Test config file(s):
>>>>>>>
Happenned during a nemesis of disrupt_no_corrupt_repair alternator-48h-5-0-db-node-d918f269-6 Succeeded 2022-05-03 08:38:06 2022-05-03 09:34:31
Got YCSB errors of:
2022-05-03 09:05:03.622: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=fe4fd1ed-ec5e-427d-8a2b-0c29bd8989fa: type=error node=Node alternator-48h-5-0-loader-node-d918f269-1 [34.251.53.215 | 10.0.1.211] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
31230561 [Thread-4] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
2022-05-03 09:05:07.542: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=68350e77-ea80-4e3e-8511-17b753e44ee1: type=error node=Node alternator-48h-5-0-loader-node-d918f269-3 [34.245.62.224 | 10.0.2.148] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
31212917 [Thread-10] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
2022-05-03 09:05:12.079: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=72c25dc0-f3e7-4937-83c6-e0a8f98ae939: type=error node=Node alternator-48h-5-0-loader-node-d918f269-2 [3.248.217.20 | 10.0.0.25] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
31228768 [Thread-2] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
<<<<<<<
$ hydra investigate show-monitor d918f269-fa5e-4057-b74f-88062e9d5d0e
$ hydra investigate show-logs d918f269-fa5e-4057-b74f-88062e9d5d0e
@fgelcer as far as I recall you said this did not happen in scylla version < 4.5.5 and only started happening on 4.5.6 - is that correct ?
I don't see why any patch in 4.5.5.. would cause this
[shlomi@shlomi-scylladb-com scylla]$ git shortlog scylla-4.5.5..
Avi Kivity (5):
Update seastar submodule (pidof command not installed)
Update seastar submodule (logger deadlock with large messages)
transport: return correct error codes when downgrading v4 {WRITE,READ}_FAILURE to {WRITE,READ}_TIMEOUT
Update tools/java submodule (bad IPv6 addresses in nodetool)
Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes
Benny Halevy (1):
table: clear: serialize with ongoing flush
Calle Wilund (1):
cdc: Ensure columns removed from log table are registered as dropped
Eliran Sinvani (2):
cql3 statements: Change dependency test API to express better it's purpose
prepared_statements: Invalidate batch statement too
Juliusz Stasiewicz (1):
CQL: Replace assert by exception on invalid auth opcode
Piotr Sarna (1):
cql3: fix qualifying restrictions with IN for indexing
Raphael S. Carvalho (1):
compaction: LCS: don't write to disengaged optional on compaction completion
Tomasz Grabiec (2):
utils/chunked_vector: Fix sigsegv during reserve()
loading_cache: Make invalidation take immediate effect
Vlad Zolotarov (1):
loading_shared_values/loading_cache: get rid of iterators interface and return value_ptr from find(...) instead
Yaron Kaikov (1):
release: prepare for 4.5.6
If the answer is yes - this reproduces on 4.5.6 and not on earlier releases - lets bisect - since its not clear what caused this change
@fgelcer as far as I recall you said this did not happen in scylla version < 4.5.5 and only started happening on 4.5.6 - is that correct ?
when you asked, i searched for such failure on 4.5 results (prior to 4.5.6) and didn't see such failures
Reproduced in enterprise-2022.1 rc4:
Kernel Version: 5.13.0-1022-aws
Scylla version (or git commit hash): 2022.1~rc4-20220510.0007e0625
with build-id 41d9ba21416afba85744e3e5308d1f2e5403ce65
Cluster size: 6 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-06f239bc4ddf227d3
(aws: us-east-1)
Test: longevity-alternator-3h-test
Test id: d64b2ff7-9e5b-4c78-975c-ea92c3811753
Test name: enterprise-2022.1/longevity/longevity-alternator-3h-test
Test config file(s):
>>>>>>>
These errors happened in between our nemesis, after ToggleCdcFeaturePropertiesOnTable
nemesis, during cluster health validation.
2022-05-11 02:03:27.274: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=5c7ed44e-f423-4ed3-b23e-93d800cca2eb: type=error node=Node alternator-3h-2022-1-loader-node-d64b2ff7-1 [44.204.210.209 | 10.0.0.124] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=6990500 -p fieldcount=10 -p fieldlength=512 -p operationcount=200200300 -p dataintegrity=true -p table=usertable_no_lwt -p maxexecutiontime=10800 -s -P /tmp/dynamodb.properties
errors:
1448182 [Thread-6] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable_no_lwt.usertable_no_lwt - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
2022-05-11 02:03:31.043: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=6dcd5221-b759-48d7-b60f-0aa303cca4d7: type=error node=Node alternator-3h-2022-1-loader-node-d64b2ff7-4 [3.234.206.152 | 10.0.3.216] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=6990500 -p fieldcount=10 -p fieldlength=512 -p operationcount=200200300 -p dataintegrity=true -p maxexecutiontime=10800 -s -P /tmp/dynamodb.properties
errors:
1463410 [Thread-21] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
<<<<<<<
$ hydra investigate show-monitor d64b2ff7-9e5b-4c78-975c-ea92c3811753
$ hydra investigate show-logs d64b2ff7-9e5b-4c78-975c-ea92c3811753
Reproduced on same above test, in RC7.
Kernel Version: 5.13.0-1025-aws
Scylla version (or git commit hash): 5.0~rc7-20220602.41a00c744
with build-id 12671ee67c8f9a992f1e64f5647ff883e5f495e0
Cluster size: 4 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-09ad25bdd94f417a5
(aws: eu-north-1)
Test: longevity-alternator-200gb-48h-test
Test id: 3253fd7f-ddb9-4af7-8d9e-f8b2e9f69fd7
Test name: scylla-5.0/longevity/longevity-alternator-200gb-48h-test
Test config file(s):
>>>>>>>
2022-06-09 19:47:05.753: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=4426b336-8e3e-4024-a1cf-cc31ed4e226b: type=error node=Node alternator-48h-5-0-loader-node-3253fd7f-4 [13.49.224.238 | 10.0.0.146] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
20431291 [Thread-19] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to alternator:8080 [alternator/10.0.1.146] failed: Connection refused (Connection refused)
2022-06-09 20:57:14.691: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=c8846e42-399f-4582-9103-fdefdaa2b69a: type=error node=Node alternator-48h-5-0-loader-node-3253fd7f-3 [13.53.198.225 | 10.0.2.138] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
24651292 [Thread-7] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
2022-06-09 20:57:16.463: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=21588754-e2fc-4af4-995e-6e0101992f52: type=error node=Node alternator-48h-5-0-loader-node-3253fd7f-2 [13.48.48.229 | 10.0.2.55] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
24664018 [Thread-14] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
2022-06-09 20:57:18.583: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=447550ef-cb40-476e-913d-6b12abac051a: type=error node=Node alternator-48h-5-0-loader-node-3253fd7f-1 [13.51.157.248 | 10.0.2.228] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
24677292 [Thread-12] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
<<<<<<<
$ hydra investigate show-monitor 3253fd7f-ddb9-4af7-8d9e-f8b2e9f69fd7
$ hydra investigate show-logs 3253fd7f-ddb9-4af7-8d9e-f8b2e9f69fd7
reproduced a similar error (after an add-remove-dc nemesis):
2023-02-09 14:24:17.714: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=627fc2d9-36b8-4c72-a8c9-8ce1667e00d6: type=error node=Node alternator-3h-5-2-loader-node-6bb4b3d6-1 [3.92.42.94 | 10.12.2.210] (seed: False)
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=6990500 -p fieldcount=10 -p fieldlength=512 -p operationcount=200200300 -p dataintegrity=true -p table=usertable_no_lwt -p maxexecutiontime=10800 -s -P /tmp/dynamodb.properties
errors:
5259371 [Thread-2] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable_no_lwt.usertable_no_lwt - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
WARNING (100)
Describe the impact this issue causes to the user.
Describe the frequency with how this issue can be reproduced.
Kernel Version: 5.15.0-1028-aws
Scylla version (or git commit hash): 5.2.0~rc1-20230207.8ff4717fd010
with build-id 78fbb2c25e9244a62f57988313388a0260084528
Cluster size: 6 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-08975a99197206651
(aws: us-east-1)
Test: longevity-alternator-3h-test
Test id: 6bb4b3d6-fa67-4123-a5c9-d87340dd6392
Test name: scylla-5.2/longevity/longevity-alternator-3h-test
Test config file(s):
I wonder how do we know that we do not overload the cluster? @yarongilor ?
- hydra investigate show-logs 6bb4b3d6-fa67-4123-a5c9-d87340dd6392
@mykaul , the cluster is around 50% loaded at this time, according to Grafana. As can be seen with more details in this snapshot: https://snapshots.raintank.io/dashboard/snapshot/NqS3FJ9fYaUh5TueLBOHxoHP1GoOLQCE
I know it's easy to mark this failure "Alternator" and assign it to me because the reads are done via the Alternator API, but Alternator reads translate to simple CQL CL=2 reads. Do we have exactly the same issue for CQL reads? If we do, let's close this one - it doesn't help to complicate this issue by yet another subsystem (Alternator) if it can be reproduced without it.
@yarongilor said the cluster is only 50% loaded at the time so the "obvious" guess like some machines being overloaded by other things (repair, hints, compactions, materialized views, or who knows what) don't work.
So maybe the replica read timeout is real, and caused by one of the nemesises? For example, if we kill a replica after read was already started from it, we can get a timeout, and it is justified? Is speculative_retry enabled in this test? Maybe for some reason it isn't working well? If we have a bug in speculative retry and this issue happens when nodes were going down, then it can explain what we see.
Another wild guess (with absolutely no evidence to back it up) that maybe the cluster isn't CPU-bound but IO bound, and is at 100% IO load. I don't know how to check that.... If we reach 100% IO load, reads may be very slow and time out because of issues like https://github.com/scylladb/scylladb/issues/8873 or a lot of other things.
P.S. How can I see in the metrics the exact time points where you suspect there was a problem? Do we know if something of interest happened right before this time - maybe some nemesis added a new node or killed a node or deleted sstables or rebooted a node or I don't know what?
As usual, it would have been nice if there was some attempt to reproduce this issue with fewer and fewer nemeses to rule out some of the subsystems. We can always blame Alternator/Repair/Compaction/MV/SI/RBNO/etc. for this issue, but if we can reproduce it without some of these subsystems, we will know it is not to blame.
Without reproduction, I suggest closing it. It's been ~half a year since it was run, we need to be more responsive (and I'll make sure we are) with issues.
How is the "half a year" relevant? QA keeps posting comments about this issue still happening. What will we gain if we close this issue and they open a new one, with less historic context?
If you have ideas how to be more responsive, I'm all ears. I tried to explain in my comment today why I have no idea how to proceed with this issue without dropping everything else I do and starting to learn what this test was doing when it experienced the read timeouts. Is this really what you want me to do? And even then, I'm 99% sure it has nothing to do with Alternator (but I'd like to see confirmation or denial from QA - whether the same issue is seen also in a CQL workload).
Oh, I see that @yarongilor posted today, but his comment was indeed based on a half-year old run. I agree then - if this hasn't reproduced in half a year, let's close it.
@yarongilor , check last runs of such test if they reproduced the issue, and if not, close it
Looks like issue was reproduced with
Scylla version: 6.0.0~rc1-20240530.b25dd2696f6a
with build-id 944ecf73fdff430d96c5c36992b54afa46a25c2f
Kernel Version: 5.15.0-1062-aws
After serial of nemesis: node cleanup and then major compaction, ycsb command failed with same error:
2024-06-04 13:55:41.673: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=f7194fa6-8fdd-48eb-866a-1dad7a293b01: type=error node=Node alternator-ttl-4-loaders-no-lwt-sis-loader-node-8135673d-1 [3.252.193.136 | 10.4.0.208]
stress_cmd=bin/ycsb load dynamodb -P workloads/workloadc -threads 13 -p recordcount=8589934401 -p fieldcount=2 -p fieldlength=16 -p insertstart=4294967200 -p insertcount=2147483600 -p table=usertable_no_lwt -p dynamodb.ttlKey=ttl -p dynamodb.ttlDuration=43200 -s -P /tmp/dynamodb.properties -p maxexecutiontime=180600
errors:
96111926 [Thread-5] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::unavailable_exception (Cannot achieve consistency level for cl LOCAL_QUORUM. Requires 2, alive 1) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
Monitoring:
Describe the impact this issue causes to the user.
Describe the frequency with how this issue can be reproduced.
Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-0c402c606e66f6f77
(aws: undefined_region)
Test: longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus-test
Test id: 8135673d-9fc5-40c5-aa1e-615fc9996d3b
Test name: scylla-6.0/alternator/longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus-test
Test config file(s):
reproduced in 2024.2. perhaps it requires also a deeper look into grafana.
YcsbStressEvent
ERROR
disrupt_run_unique_sequence
2024-09-10 08:26:09.490
not-set
Node alternator-48h-2024-2-loader-node-e91c7bc4-3 [54.247.17.97 | 10.4.3.37]
2024-09-10 08:26:09.490: (YcsbStressEvent Severity.ERROR) period_type=not-set event_id=ac64d07b-39fa-46a6-93b0-fb30fc85ae70: type=error node=Node alternator-48h-2024-2-loader-node-e91c7bc4-3 [54.247.17.97 | 10.4.3.37]
stress_cmd=bin/ycsb run dynamodb -P workloads/workloadc -threads 20 -p readproportion=0.5 -p updateproportion=0.5 -p recordcount=200200300 -p fieldcount=8 -p fieldlength=128 -p operationcount=2140000000 -p dataintegrity=true -p maxexecutiontime=147600 -s -P /tmp/dynamodb.properties
errors:
59978049 [Thread-4] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_usertable.usertable - received only 1 responses from 2 CL=LOCAL_QUORUM.) (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: null; Proxy: null)
Scylla version: 2024.2.0~rc2-20240904.4c26004e5311
with build-id a8549197de3c826053f88ddfd045b365b9cd8692
Kernel Version: 5.15.0-1068-aws
Describe your issue in detail and steps it took to produce it.
Describe the impact this issue causes to the user.
Describe the frequency with how this issue can be reproduced.
Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-0555cb82c50d0d5f1
(aws: undefined_region)
Test: longevity-alternator-200gb-48h-test
Test id: e91c7bc4-6235-4649-9874-627c46f1c8f8
Test name: enterprise-2024.2/alternator/longevity-alternator-200gb-48h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
i'm seeing it on alternator longevity for 4.5.6:
Installation details Kernel version:
5.4.0-1035-aws
Scylla version (or git commit hash):4.5.6-0.20220424.85d3fe744
Cluster size: 6 nodes (i3.4xlarge) Scylla running with shards number (live nodes): alternator-3h-4-5-db-node-6ef7edc2-1 (16.16.66.22 | 10.0.0.60): 14 shards alternator-3h-4-5-db-node-6ef7edc2-2 (13.51.13.163 | 10.0.2.56): 14 shards alternator-3h-4-5-db-node-6ef7edc2-4 (16.171.65.166 | 10.0.2.84): 14 shards alternator-3h-4-5-db-node-6ef7edc2-5 (13.48.70.197 | 10.0.2.128): 14 shards alternator-3h-4-5-db-node-6ef7edc2-6 (13.48.104.229 | 10.0.0.192): 14 shards alternator-3h-4-5-db-node-6ef7edc2-7 (16.170.237.120 | 10.0.3.51): 14 shards alternator-3h-4-5-db-node-6ef7edc2-8 (13.49.72.158 | 10.0.1.220): 14 shards Scylla running with shards number (terminated nodes): alternator-3h-4-5-db-node-6ef7edc2-3 (13.48.26.23 | 10.0.1.69): 14 shards OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-05a25f88ebc3136e6
(aws: eu-north-1)Test:
longevity-alternator-3h-test
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
Restore Monitor Stack command:
$ hydra investigate show-monitor 6ef7edc2-475d-4ede-ae31-4a8a54dd1a72
Restore monitor on AWS instance using Jenkins job Show all stored logs command:$ hydra investigate show-logs 6ef7edc2-475d-4ede-ae31-4a8a54dd1a72
Test id:
6ef7edc2-475d-4ede-ae31-4a8a54dd1a72
Logs: grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_093744/grafana-screenshot-alternator-20220427_094238-alternator-3h-4-5-monitor-node-6ef7edc2-1.png](https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_093744/grafana-screenshot-alternator-20220427_094238-alternator-3h-4-5-monitor-node-6ef7edc2-1.png) grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_093744/grafana-screenshot-longevity-alternator-3h-test-scylla-per-server-metrics-nemesis-20220427_094222-alternator-3h-4-5-monitor-node-6ef7edc2-1.png](https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_093744/grafana-screenshot-longevity-alternator-3h-test-scylla-per-server-metrics-nemesis-20220427_094222-alternator-3h-4-5-monitor-node-6ef7edc2-1.png) db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/db-cluster-6ef7edc2.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/db-cluster-6ef7edc2.tar.gz) loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/loader-set-6ef7edc2.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/loader-set-6ef7edc2.tar.gz) monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/monitor-set-6ef7edc2.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/monitor-set-6ef7edc2.tar.gz) sct-runner - [https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/sct-runner-6ef7edc2.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6ef7edc2-475d-4ede-ae31-4a8a54dd1a72/20220427_094808/sct-runner-6ef7edc2.tar.gz)
Jenkins job URL