Closed fruch closed 1 year ago
I'm trying this https://github.com/scylladb/scylla-cluster-tests/pull/5783/commits/b3078860de48ceadc5463a9bb132a0709dc64ea7 to configured the hard-stop-after to longer period
I can report that 5min had those disconnections go away, and it was working stable for 12h https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-scylla-operator-12h-multitenant-eks/30/
me and @zimnx we trying to figure out where the default of that value was coming from,
one suggesting was from: https://github.com/haproxytech/kubernetes-ingress/blob/c38aa87d9d03d3f8522749ca49363ff90b81eda0/pkg/haproxy/env/defaults.go#L68 it's say it's 30m, but not clear if it was used or not.
need to confirm with a running instance, what's the configuration says
The default value is indeed 30mins.
I found one weird thing in your configuration while browsing through logs and haproxy controller logs. ~30mins (default hard-stop-after) before hard-stop which cased connection drop, there are following logs:
2023/03/22 17:46:07 TRACE service/endpoints.go:110 backend scylla_sct-cluster-us-east1-b-us-east1-0_cql-ssl: number of slots 1
2023/03/22 17:46:07 DEBUG service/endpoints.go:123 Server slots in backend 'scylla_sct-cluster-us-east1-b-us-east1-0_cql-ssl' scaled to match scale-server-slots value: 1, reload required
then haproxy process gets a restart command, and after 30mins it's hard stopped dropping the connections. The default "server-slots", is set to 42. You set it explicily on ingress objects via annotations in scyllacluster to 1.
exposeOptions:
cql:
ingress:
annotations:
haproxy.org/scale-server-slots: "1"
haproxy.org/ssl-passthrough: "true"
I think it might be causing the restart which gets stuck for some reason. If you wouldn't set it, then haproxy wouldn't be reloaded.
seem like we introduced it in SCT based on some unfinished work in https://github.com/scylladb/scylla-operator/pull/1076
I'm removing it and trying it again: https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-scylla-operator-12h-multitenant-eks/31/
seem like we introduced it in SCT based on some unfinished work in #1076
I'm removing it and trying it again: https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-scylla-operator-12h-multitenant-eks/31/
@zimnx, removing only the scale-server-slots
didn't seemed to work
Kernel Version: 5.10.173-154.642.amzn2.x86_64
Scylla version (or git commit hash): 5.3.0~dev-20230329.6525209983d1
with build-id da8cde2a3d8c048a3a15dfd19fd14dd535fec6d1
Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.9.0-alpha.2-8-gee48da7 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run: No resources left at the end of the run
OS / Image: `` (k8s-eks: eu-central-1)
Test: longevity-scylla-operator-12h-multitenant-eks
Test id: de543eec-e26a-47b8-9d1f-6212afce6e85
Test name: scylla-staging/fruch/longevity-scylla-operator-12h-multitenant-eks
Test config file(s):
@zimnx @tnozicka
I've added a log, so now we have the full the log of haproxy,
remove the scale-server-slots: "1"
we are stilling seeing this issue, during rolling restart or one of the clusters (1 out of 14):
2023/05/15 00:22:52 TRACE controller.go:171 HAProxy config sync ended
[WARNING] (686) : soft-stop running for too long, performing a hard-stop.
[WARNING] (686) : Proxy ssl hard-stopped (105 remaining conns will be closed).
[WARNING] (679) : soft-stop running for too long, performing a hard-stop.
[WARNING] (679) : Proxy ssl hard-stopped (119 remaining conns will be closed).
[WARNING] (686) : Some tasks resisted to hard-stop, exiting now.
[NOTICE] (266) : haproxy version is 2.6.6-274d1a4
[WARNING] (266) : Former worker (686) exited with code 0 (Exit)
[WARNING] (679) : Some tasks resisted to hard-stop, exiting now.
[NOTICE] (268) : haproxy version is 2.6.6-274d1a4
[WARNING] (268) : Former worker (679) exited with code 0 (Exit)
[WARNING] (813) : Server scylla-4_sct-cluster-4-us-east1-b-us-east1-2_cql-ssl/SRV_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] (801) : Server scylla-4_sct-cluster-4-us-east1-b-us-east1-2_cql-ssl/SRV_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Kernel Version: 5.10.178-162.673.amzn2.x86_64
Scylla version (or git commit hash): 5.3.0~dev-20230512.7fcc4031229b
with build-id d6f9b433d295cf0420d28abedc89ff756eb0b75e
Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.9.0-alpha.3-5-g34369da Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run: No resources left at the end of the run
OS / Image: `` (k8s-eks: eu-north-1)
Test: longevity-scylla-operator-12h-multitenant-eks
Test id: 75d73bb9-e15f-4955-a016-c36272dd91f1
Test name: scylla-staging/fruch/longevity-scylla-operator-12h-multitenant-eks
Test config file(s):
fyi, haproxy bump has landed https://github.com/scylladb/scylla-operator/pull/1235 - I wonder whether it changes something or not
fyi, haproxy bump has landed #1235 - I wonder whether it changes something or not
I could try it out, but it sounds like a longshot.
I talked about our issue with one haproxy dev and apparently haproxy doesn't support moving existing connections to new process. They only support moving socket listeners.
The process that requires reload is killed either when all existing connections are done or after configurable timeout hard-stop-after
(30min by default).
Driver connections are long-lived, i'm even not sure whether idle connections are closed at all.
We could extend the timeout to some high value, then haproxy would keep adding more proccess when reload is required, new connections would connect to the new one. Eventually 'stale' processes would be killed. But having lots of these process would cause increased memory consumption. How high, we would need to measure.
Unfortunately a haproxy reload is required when Ingress is added/removed, as new backend is added to configuration file and this requires a reload.
I talked about our issue with one haproxy dev and apparently haproxy doesn't support moving existing connections to new process. They only support moving socket listeners. The process that requires reload is killed either when all existing connections are done or after configurable timeout
hard-stop-after
(30min by default). Driver connections are long-lived, i'm even not sure whether idle connections are closed at all.We could extend the timeout to some high value, then haproxy would keep adding more proccess when reload is required, new connections would connect to the new one. Eventually 'stale' processes would be killed. But having lots of these process would cause increased memory consumption. How high, we would need to measure.
Unfortunately a haproxy reload is required when Ingress is added/removed, as new backend is added to configuration file and this requires a reload.
I think the major issue regarding which timeout we'll set, is that all the connection would be lost at once, and application using CQL driver can't manage this situation (i.e. all of out tooling doesn't, cassandra-stress, scylla-bench, ycsb and gemini)
do we have a way to control how the reload happens, i.e. if we have 10 instances of haproxy, reload happens on all of them at the same time ?
we to find some way we don't close all of a client connection at the same time.
do we know how Cassandra is doing it ?
Long lived connections have to account for being terminated at some point, even with rolling restarts (pod for pod). So there is usually a graceful timeout of 60 seconds or so and then the connection is closed and the client needs to reestablish it. I don't think it makes sense to try to strech that time artificially high, we don't want to wait 30 minutes to restart a single pod in a set. I'd assume when it closes the client connection a new one should be reestablished and drivers should handle new calls with a new connection.
Long lived connections have to account for being terminated at some point, even with rolling restarts (pod for pod). So there is usually a graceful timeout of 60 seconds or so and then the connection is closed and the client needs to reestablish it. I don't think it makes sense to try to strech that time artificially high, we don't want to wait 30 minutes to restart a single pod in a set. I'd assume when it closes the client connection a new one should be reestablished and drivers should handle new calls with a new connection.
we are talking about restarts of the haproxy, and that's current affect all the nodes, and all the clusters, and the exact same time. cause of one pod restarted/deleted.
no current CQL driver can handle it gracefully.
In future deployments, it may make sense to have more than one SNI proxy - even per AZ, and for the driver to go through both, for load balancing and high availability.
in production, there will be HA proxy setup for each AZ, but I don't think it make a difference as the reload happens to all proxies at once because of ingress changes and even without that for rolling restarts you may be unlucky enough that all your connections hit the same proxy with round robin or random balancing policies - with scale the likelihood is lower.
no current CQL driver can handle it gracefully.
should it? (if you don't have an active connection, create one on demand)
in production, there will be HA proxy setup for each AZ, but I don't think it make a difference as the reload happens to all proxies at once because of ingress changes and even without that for rolling restarts you may be unlucky enough that all your connections hit the same proxy with round robin or random balancing policies - with scale the likelihood is lower.
no current CQL driver can handle it gracefully.
should it? (if you don't have an active connection, create one on demand)
I thought so too...
Reload at once kinda ruins that.
Does graceful reload ruin it as well?
Graceful reload:
In case you are reloading "in process" (same IP+port) step 3. can be skipped. But at any point, new connection (retry or not) should always succeed.
Reload at once kinda ruins that.
Does graceful reload ruin it as well?
Graceful reload:
- Create a new pod/worker
- Old pod/worker stops accepting connection, but serves the old ones
- shutdown timeout, say 90s for LB to notice it (slightly bigger then LB probe cycle)
- Old pod/worker closes connections and terminates
- Only new worker serves connections
In case you are reloading "in process" (same IP+port) step 3. can be skipped. But at any point, new connection (retry or not) should always succeed.
Still we have a time when all connections would be broken at once, so from client POV it's not so graceful.
I don't see how the situation is going to be better than the current state.
Still we have a time when all connections would be broken at once, so from client POV it's not so graceful.
I don't get it - at any point in time a new connection will succeed and long lived connections are meant to be reestablished gracefully (for the client).
Still we have a time when all connections would be broken at once, so from client POV it's not so graceful.
I don't get it - at any point in time a new connection will succeed and long lived connections are meant to be reestablished gracefully (for the client).
How is the client gonna know it's new to reestablish a given connection ?
Clients now would keep connections until closed, so if one node stops and connections break it's fine, but severing all open connections at the same time breaks the service from client POV.
I think this is a showstopper for our current SNI approach.
How is the client gonna know it's new to reestablish a given connection ?
it's gonna get closed, usually when idle (in keep alive mode)
but severing all open connections at the same time breaks the service from client POV.
maybe today - but you'd have to explain to me why a client can't open a new (on-demand) connection when all the pre-cached ones get closed
I think this is a showstopper for our current SNI approach.
I suppose even if that would be an issue (I don't think it is at this point), it would be an issue with a particular Ingress controller implementation (haproxy), not SNI approach in general. We use haproxy because it was cheap to start with but we can switch it or introduce our own golang proxy which we may need for cluster pausing anyways.
How is the client gonna know it's new to reestablish a given connection ?
it's gonna get closed, usually when idle (in keep alive mode)
but severing all open connections at the same time breaks the service from client POV.
maybe today - but you'd have to explain to me why a client can't open a new (on-demand) connection when all the pre-cached ones get closed
driver can open new connections if connections breaks, and move to the next connection. but it doesn't have any connections at all, it get back with a failure to the called.
in the case we are talking about, caller (i.e. cassnadra stress) is trying x10 times the same request, and still failing)
I'm not sure it's o.k. to tell users you'll need to retry x time/times more then you do with regular scylla/cassnadra deployment since we gonna reload haproxy once in a while...
I think this is a showstopper for our current SNI approach.
I suppose even if that would be an issue (I don't think it is at this point), it would be an issue with a particular Ingress controller implementation (haproxy), not SNI approach in general. We use haproxy because it was cheap to start with but we can switch it or introduce our own golang proxy which we may need for cluster pausing anyways.
but it doesn't have any connections at all, it get back with a failure to the called.
this is where I propose it opens a new connection on-demand and give it to the caller instead of an error
in the case we are talking about, caller (i.e. cassnadra stress) is trying x10 times the same request, and still failing)
what is the reason?
I'm not sure it's o.k. to tell users you'll need to retry x time/times more then you do with regular scylla/cassnadra deployment since we gonna reload haproxy once in a while...
definitely not, this should never get to the user and should be handled by the library without an error by just opening an on demand connection if the pool is empty
but it doesn't have any connections at all, it get back with a failure to the called.
this is where I propose it opens a new connection on-demand and give it to the caller instead of an error
in the case we are talking about, caller (i.e. cassnadra stress) is trying x10 times the same request, and still failing)
what is the reason?
- pool gives back 10 times a closed connection?
- connection get fails because pool is empty?
- pool tries to establish a new connection on demand (10 times) and it fails to create it? (it should work but something may be off)
I'm not sure it's o.k. to tell users you'll need to retry x time/times more then you do with regular scylla/cassnadra deployment since we gonna reload haproxy once in a while...
definitely not, this should never get to the user and should be handled by the library without an error by just opening an on demand connection if the pool is empty
I can rerun it with extra driver logs, and then you can go over it with whom someone from the drivers team.
After that somehow we'll need to fix/validate across all drivers.
@zimnx @tnozicka, here's a re-run with c-s and driver logs enabled
here one example of the failure:
java.io.IOException: Operation x10 on key(s) [34324b4f303439323930]: Error executing: (NoHostAvailableException): All host(s) tried for query failed (tried: a25ad45e9a6b741809da4bdef37b9ded-1349193356.eu-north-1.elb.amazonaws.com:9142:ad4261b0-5c29-4106-9ba4-56fe0b2d23c0.cql.sct-cluster-10.sct.scylladb.com (com.datastax.driver.core.exceptions.ConnectionException: [a25ad45e9a6b741809da4bdef37b9ded-1349193356.eu-north-1.elb.amazonaws.com:9142:ad4261b0-5c29-4106-9ba4-56fe0b2d23c0.cql.sct-cluster-10.sct.scylladb.com] Write attempt on defunct connection), a25ad45e9a6b741809da4bdef37b9ded-1349193356.eu-north-1.elb.amazonaws.com:9142:ade6bc03-a9e7-435f-ad8f-51a1fa690f02.cql.sct-cluster-10.sct.scylladb.com (com.datastax.driver.core.exceptions.ConnectionException: [a25ad45e9a6b741809da4bdef37b9ded-1349193356.eu-north-1.elb.amazonaws.com:9142:ade6bc03-a9e7-435f-ad8f-51a1fa690f02.cql.sct-cluster-10.sct.scylladb.com] Write attempt on defunct connection))
at org.apache.cassandra.stress.Operation.error(Operation.java:141)
at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:119)
at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:101)
at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:264)
at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:473)
Kernel Version: 5.10.179-166.674.amzn2.x86_64
Scylla version (or git commit hash): 5.4.0~dev-20230602.8be69fc3a087
with build-id 824da7c9ac7baeb719819cc56991aebe48371426
Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.9.0-alpha.4-14-gdb443d0 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run: No resources left at the end of the run
OS / Image: `` (k8s-eks: undefined_region)
Test: longevity-scylla-operator-12h-multitenant-eks
Test id: 4a28dcd1-7085-4bea-b472-11d6f2959aff
Test name: scylla-staging/fruch/longevity-scylla-operator-12h-multitenant-eks
Test config file(s):
I skimmed through both gocql and java driver code around connection pools, and they seem to have poor support for loosing all the connections.
gocql starts asynchronous pool filling, when it detects that it doesn't fully cover all shards of particular node, and returns nil - which causes an error to be returned back to the user - when the pool is empty: https://github.com/scylladb/gocql/blob/v1.7.3/connectionpool.go#L313-L333
Java behaves the same, an error without reconnect is returned to the user.
Instead, drivers should at least try to reconnect and return nil only if it's not successful. Otherwise rolling restart isn't fully supported.
Operation x10 [...] All host(s) tried for query failed [...] Write attempt on defunct connection
Sound like a closed connection is returned to the driver by the pool.
Issues reported in drivers: https://github.com/scylladb/gocql/issues/140 https://github.com/scylladb/java-driver/issues/236
Closing this in favour of the specific driver issues, thanks!
@tnozicka
isn't this the same issue as in https://github.com/scylladb/scylla-operator/issues/1341
also was those driver issues were handover to any one in the driver team ?, cause doesn't seems anyone is aware of those @roydahan, @avelanarius FYI
@tnozicka isn't this the same issue as in https://github.com/scylladb/scylla-operator/issues/1341
I don't think so.
also was those driver issues were handover to any one in the driver team ?
2 comments above @zimnx has referenced https://github.com/scylladb/gocql/issues/140 and https://github.com/scylladb/java-driver/issues/236 that have been filed the driver issues. Are you saying the team that manages those repos is not aware of them?
I was aware of the issues when they were filled and I read them. I didn't like the proposed solution (it would negatively impact non-proxy workloads) and I couldn't immediately find any other better solution. Combined with our current priority of serverless, those issue therefore weren't prioritized by me.
Issue description
While running a test with 14 tenants with sni_proxy, After ~40min, we are running into a case all of the CQL connections (of all tenants) are getting closed at the same time.
from the haproxy log we see this:
Impact
this renders all of the tenets useless, cql driver can handle well 1-2 connections getting closed, but none of the tools expect all the connections to be close at the same time.
How frequently does it reproduce?
this happens on every run we are doing
Installation details
Kernel Version: 5.4.228-131.415.amzn2.x86_64 Scylla version (or git commit hash):
5.3.0~dev-20230214.2653865b34d8
with build-ideb6fb0dc2a97faec591d4020d9c3671de48b2436
Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.9.0-alpha.1-13-gc6a6e05 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run: No resources left at the end of the run
OS / Image: `` (k8s-eks: eu-central-1)
Test:
longevity-scylla-operator-12h-multitenant-eks
Test id:3720410a-7757-4d41-89af-320eae9656b1
Test name:scylla-staging/fruch/longevity-scylla-operator-12h-multitenant-eks
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 3720410a-7757-4d41-89af-320eae9656b1` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=3720410a-7757-4d41-89af-320eae9656b1) - Show all stored logs command: `$ hydra investigate show-logs 3720410a-7757-4d41-89af-320eae9656b1` ## Logs: - **db-cluster-3720410a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/db-cluster-3720410a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/db-cluster-3720410a.tar.gz) - **scylla-10_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-10_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-10_cluster_events-3720410a.log.tar.gz) - **scylla-5_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-5_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-5_cluster_events-3720410a.log.tar.gz) - **output-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/output-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/output-3720410a.log.tar.gz) - **scylla_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla_cluster_events-3720410a.log.tar.gz) - **debug-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/debug-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/debug-3720410a.log.tar.gz) - **events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/events-3720410a.log.tar.gz) - **scylla-12_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-12_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-12_cluster_events-3720410a.log.tar.gz) - **scylla-7_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-7_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-7_cluster_events-3720410a.log.tar.gz) - **sct-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/sct-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/sct-3720410a.log.tar.gz) - **scylla-9_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-9_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-9_cluster_events-3720410a.log.tar.gz) - **normal-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/normal-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/normal-3720410a.log.tar.gz) - **argus-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/argus-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/argus-3720410a.log.tar.gz) - **raw_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/raw_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/raw_events-3720410a.log.tar.gz) - **scylla-2_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-2_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-2_cluster_events-3720410a.log.tar.gz) - **scylla-13_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-13_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-13_cluster_events-3720410a.log.tar.gz) - **critical-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/critical-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/critical-3720410a.log.tar.gz) - **scylla-11_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-11_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-11_cluster_events-3720410a.log.tar.gz) - **scylla-4_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-4_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-4_cluster_events-3720410a.log.tar.gz) - **scylla-3_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-3_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-3_cluster_events-3720410a.log.tar.gz) - **warning-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/warning-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/warning-3720410a.log.tar.gz) - **summary-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/summary-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/summary-3720410a.log.tar.gz) - **scylla-8_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-8_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-8_cluster_events-3720410a.log.tar.gz) - **scylla-14_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-14_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-14_cluster_events-3720410a.log.tar.gz) - **error-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/error-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/error-3720410a.log.tar.gz) - **scylla-6_cluster_events-3720410a.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-6_cluster_events-3720410a.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/scylla-6_cluster_events-3720410a.log.tar.gz) - **monitor-set-3720410a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/monitor-set-3720410a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/monitor-set-3720410a.tar.gz) - **loader-set-3720410a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/loader-set-3720410a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/loader-set-3720410a.tar.gz) - **kubernetes-3720410a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/kubernetes-3720410a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3720410a-7757-4d41-89af-320eae9656b1/20230215_081051/kubernetes-3720410a.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-scylla-operator-12h-multitenant-eks/14/)