Closed yarongilor closed 1 year ago
@fgelcer , can you advise - it looks like this event should have been filtered out by: https://github.com/scylladb/scylla-cluster-tests/pull/4220
@fgelcer , can you advise - it looks like this event should have been filtered out by: #4220
@yarongilor ignore_ycsb_connection_refused
that was fixed in #4220 is only used in upgrade_tests.
when YCSB is used, since it's using the DNS, there are cases it would use a node that is down.
filtering it, might be problematic, since we'll need to do so for each place we take a node down.
@fgelcer , can you advise - it looks like this event should have been filtered out by: #4220
@yarongilor
ignore_ycsb_connection_refused
that was fixed in #4220 is only used in upgrade_tests.when YCSB is used, since it's using the DNS, there are cases it would use a node that is down.
filtering it, might be problematic, since we'll need to do so for each place we take a node down.
@fruch , why not apply this filter to all nemeses contains reboot somehow in a generic way? or else what's the alternative - changing this error severity to "warning"?
Bumped into the same in:
Kernel Version: 5.13.0-1025-aws
Scylla version (or git commit hash): 2022.1~rc7-20220602.7abea3aad
with build-id 57fb7e7c94bbac6498149648f3818be3c1322ef9
Cluster size: 6 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-0c0c4f759c88cd17d
(aws: us-east-1)
Test: longevity-alternator-3h-test
Test id: c0719d2a-85bb-4e6c-b228-4479afd09a0a
Test name: enterprise-2022.1/longevity/longevity-alternator-3h-test
Test config file(s):
>>>>>>> Your description here... <<<<<<<
$ hydra investigate show-monitor c0719d2a-85bb-4e6c-b228-4479afd09a0a
$ hydra investigate show-logs c0719d2a-85bb-4e6c-b228-4479afd09a0a
@fruch is there a general solution we can do here? If not, and a filter needed for every nemesis that may reboot the node we provide to YCSB, like rolling-restart let's do it.
@fruch is there a general solution we can do here? If not, and a filter needed for every nemesis that may reboot the node we provide to YCSB, like rolling-restart let's do it.
I can't think of anything general, except filtering it always (i.e. ignoring it completely)
since we already have a context manager for those, we can apply it on nemesis we encountered the issue.
Yes, that’s the simple solution, but I thought maybe there is a way to let YCSB know more hosts or something like that.
On Mon, Jun 13, 2022 at 08:17 Israel Fruchter @.***> wrote:
@fruch https://github.com/fruch is there a general solution we can do here? If not, and a filter needed for every nemesis that may reboot the node we provide to YCSB, like rolling-restart let's do it.
I can't think of anything general, except filtering it always (i.e. ignoring it completely)
since we already have a context manager for those, we can apply it on nemesis we encountered the issue.
— Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-cluster-tests/issues/4738#issuecomment-1153482760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE75CYFJWIUHPX2DUC2BXVLVO277ZANCNFSM5V655H6A . You are receiving this because you commented.Message ID: @.***>
we are using dynamodb client, it only knows one dns name, aws clients aren't aware of nodes.
this is why we are using a DNS server todo the "balancing", having a proper load-blancer is not implemented (nor in SCT, not in scylla-cloud)
On Mon, Jun 13, 2022 at 10:11 AM Roy Dahan @.***> wrote:
Yes, that’s the simple solution, but I thought maybe there is a way to let YCSB know more hosts or something like that.
On Mon, Jun 13, 2022 at 08:17 Israel Fruchter @.***> wrote:
@fruch https://github.com/fruch is there a general solution we can do here? If not, and a filter needed for every nemesis that may reboot the node we provide to YCSB, like rolling-restart let's do it.
I can't think of anything general, except filtering it always (i.e. ignoring it completely)
since we already have a context manager for those, we can apply it on nemesis we encountered the issue.
— Reply to this email directly, view it on GitHub < https://github.com/scylladb/scylla-cluster-tests/issues/4738#issuecomment-1153482760 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AE75CYFJWIUHPX2DUC2BXVLVO277ZANCNFSM5V655H6A
. You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-cluster-tests/issues/4738#issuecomment-1153555615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACTH4ZDPL53BUHGYCKA3WTVO3NINANCNFSM5V655H6A . You are receiving this because you were mentioned.Message ID: @.***>
Reproduced in restart-with-resharding nemesis:
(YcsbStressEvent Severity.ERROR) period_type=not-set event_id=566796be-31ea-47a1-85fe-030cfbf88357: type=error node=Node alternator-ttl-4-loaders-no-lwt-sis-loader-node-7da36ba4-3 [3.252.127.132 | 10.4.1.47] (seed: False)
stress_cmd=bin/ycsb load dynamodb -P workloads/workloadc -threads 13 -p recordcount=8589934401 -p fieldcount=2 -p fieldlength=16 -p insertstart=2147483600 -p insertcount=2147483600 -p table=usertable_no_lwt -p dynamodb.ttlKey=ttl -p dynamodb.ttlDuration=43200 -s -P /tmp/dynamodb.properties -p maxexecutiontime=180600
errors:
1265545 [Thread-12] ERROR site.ycsb.db.DynamoDBClient -com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to alternator:8080 [alternator/10.4.0.41] failed: Connection refused (Connection refused)
Kernel Version: 5.15.0-1019-aws
Scylla version (or git commit hash): 5.1.0~rc1-20220902.d10aee15e7e9
with build-id c127c717ecffa082ce97b94100d62b2549abe486
Relocatable Package: http://downloads.scylladb.com/unstable/scylla/branch-5.1/relocatable/2022-09-03T00:52:08Z/scylla-x86_64-package.tar.gz
Cluster size: 4 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-0437de2d7a582f47e
(aws: us-east-1)
Test: longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-nemesis
Test id: 1670b377-7689-4fce-9ea5-27d154c7c954
Test name: scylla-staging/yarongilor/longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-nemesis
Test config file(s):
>>>>>>> Your description here... <<<<<<<
$ hydra investigate show-monitor 1670b377-7689-4fce-9ea5-27d154c7c954
$ hydra investigate show-logs 1670b377-7689-4fce-9ea5-27d154c7c954
@yarongilor instead of keep adding this this issue, it's a one liner change: https://github.com/scylladb/scylla-cluster-tests/pull/5391
@fruch , what about other nemesis? should it applied the same?
@fruch , what about other nemesis? should it applied the same?
we add it where is was obvious there's a reboot/restart. clearly we missed a few, if you happen to encounter it again during restart of a node, you now know what todo.
Installation details
Kernel Version: 5.13.0-1021-aws Scylla version (or git commit hash):
5.0~rc3-20220406.f92622e0d
with build-id2b79c4744216b294fdbd2f277940044c899156ea
Cluster size: 4 nodes (i3.4xlarge)Scylla Nodes used in this run:
OS / Image:
ami-07835983d717b1ea3
(aws: eu-west-1)Test:
longevity-alternator-200gb-48h-test
Test id:d918f269-fa5e-4057-b74f-88062e9d5d0e
Test name:scylla-5.0/longevity/longevity-alternator-200gb-48h-test
Test config file(s):Issue description
scenario: running nemesis rolling_config_change_internode_compression
reboot node-1 successfully, then node-2:
Got "Connection refused" for the rebooted node-2:
The message is not filtered out and test got:
<<<<<<<
$ hydra investigate show-monitor d918f269-fa5e-4057-b74f-88062e9d5d0e
$ hydra investigate show-logs d918f269-fa5e-4057-b74f-88062e9d5d0e
Logs:
Jenkins job URL