Cassandra-stress load, when is running on multi DC cluster, sends the requests to one DC only, where loader is located

juliayakovlev commented 1 year ago

Issue description

[x] This issue is a regression.
[ ] It is unknown if this issue is a regression.

Problem

Cassandra-stress load, when is running on multi DC cluster, sends the requests:

Scylla-tools version: scylla-tools-2023.1.0~rc3-0.20230321.80de75947b7a: to one DC only, where loader is located (not expected).

Scylla-tools version scylla-tools-2022.2.dev-0.20220330.eef4cbb20a51 : to all nodes (as expected)

Sorry, I did not find how I can get driver version

Description

Scylla-tools version: scylla-tools-2023.1.0~rc3-0.20230321.80de75947b7a . Cluster with 9 nodes are spreded on 3 DCs:

"eu-west-1" - 4 nodes
"us-west-2" - 3 nodes
"us-east-1" - 2 nodes

The test runs with 3 loaders. All loaders are located on "eu-west-1".

c-s commands are sent to all nodes (from all data centers):

cassandra-stress user profile=/tmp/c-s_lwt_big_data_multidc.yaml n=10000000 ops'(insert_query=1)' cl=QUORUM -mode native cql3 -rate threads=1000 -transport 'truststore=/etc/scylla/ssl_conf/client/cacerts.jks truststore-password=cassandra' 
-node 10.4.1.250,10.4.0.195,10.4.3.48,10.4.3.6,10.15.0.186,10.15.3.127,10.15.3.160,10.12.1.56,10.12.2.57 -errors skip-unsupported-columns

Keyspace definition with class NetworkTopologyStrategy:

  CREATE KEYSPACE IF NOT EXISTS cqlstress_lwt_example WITH replication = {'class': 'NetworkTopologyStrategy', 'eu-westscylla_node_west': 3, 'us-west-2scylla_node_west': 2, 'us-eastscylla_node_east': 1};

From the "Requests served per instance" graph we can see that only 4 instances received the requests: unnamed

I ran same test using Scylla-tools version scylla-tools-2022.2.dev-0.20220330.eef4cbb20a51 (https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/yulia/job/longevity-lwt-24h-multidc-test/3/). In this run all nodes are received requests. unnamed

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Kernel Version: 5.15.0-1031-aws Scylla version (or git commit hash): 2023.1.0~rc2-20230302.726f8a090337 with build-id efcc950cda22875aa488c5295c6cc68f35d9cc6f

Cluster size: 9 nodes (i3.8xlarge)

Scylla Nodes used in this run:

lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-9 (44.201.63.47 | 10.12.2.57) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-8 (18.206.149.211 | 10.12.1.56) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-7 (54.202.242.29 | 10.15.3.160) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-6 (54.68.136.199 | 10.15.3.127) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-5 (18.236.116.61 | 10.15.0.186) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-4 (34.244.60.4 | 10.4.3.6) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-3 (54.247.62.102 | 10.4.3.48) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-2 (54.75.33.82 | 10.4.0.195) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-13 (52.31.161.204 | 10.4.2.130) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-12 (54.194.151.172 | 10.4.3.1) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-11 (54.170.178.53 | 10.4.3.103) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-10 (54.194.115.156 | 10.4.3.30) (shards: 30)
lwt-longevity-multi-dc-24h-2023-1-db-node-c4c3ee91-1 (34.243.57.14 | 10.4.1.250) (shards: 30)

OS / Image: ami-0336a5a646b08dc25 ami-017ff4adbd0789b7d ami-0bfa3b8209fe4bb00 (aws: eu-west-1)

Test: longevity-lwt-24h-multidc-test Test id: c4c3ee91-c03b-4dc9-9df9-4e1f42885db5 Test name: enterprise-2023.1/longevity/longevity-lwt-24h-multidc-test Test config file(s):

longevity-lwt-24h-multidc.yaml

Logs and commands

- Restore Monitor Stack command: `$ hydra investigate show-monitor c4c3ee91-c03b-4dc9-9df9-4e1f42885db5` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=c4c3ee91-c03b-4dc9-9df9-4e1f42885db5) - Show all stored logs command: `$ hydra investigate show-logs c4c3ee91-c03b-4dc9-9df9-4e1f42885db5` ## Logs: *No logs captured during this run.* [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-lwt-24h-multidc-test/3/)

roydahan commented 1 year ago

@juliayakovlev please mention if we use "NetworkTopology" for the keyspace and with what configuration.

juliayakovlev commented 1 year ago

@juliayakovlev please mention if we use "NetworkTopology" for the keyspace and with what configuration.

Keyspace definition with class NetworkTopologyStrategy:

  CREATE KEYSPACE IF NOT EXISTS cqlstress_lwt_example WITH replication = {'class': 'NetworkTopologyStrategy', 'eu-westscylla_node_west': 3, 'us-west-2scylla_node_west': 2, 'us-eastscylla_node_east': 1};

Updated description as well, thanks

Gor027 commented 1 year ago

The default load balancing policy in scylla-tools-java is DCAwareRoundRobinPolicy which prefers nodes in the local datacenter. Unless all local nodes are tried, it will not send queries to other nodes in remote DC. If a local datacenter name is not provided the policy will default to the datacenter of the first node it is connected to (10.4.1.250 in this case).

roydahan commented 1 year ago

@juliayakovlev In this case, I think you should just provide the loaders we have in a list as well '1 1 1' and you'll have a loader per DC.

juliayakovlev commented 1 year ago

@juliayakovlev In this case, I think you should just provide the loaders we have in a list as well '1 1 1' and you'll have a loader per DC.

@roydahan Even when there are loader in each region, the requests are sent to the first region only: Screenshot from 2023-04-09 18-10-17

https://jenkins.scylladb.com/job/scylla-staging/job/yulia/job/longevity-lwt-24h-multidc-test/4/

Maybe we need to change load balancing policy?

roydahan commented 1 year ago

@juliayakovlev

What do you see here that suggests that "requests are sent to the first region only"?
Did you verify that each loader is on different DC and it's actually DC-aware? (I have no idea how it knows).

juliayakovlev commented 1 year ago

@juliayakovlev

What do you see here that suggests that "requests are sent to the first region only"?

I see that part of the nodes has line on 0 ops/s Here you can see it with numbers Screenshot from 2023-04-10 10-23-14

When every node receives requests, there is no line on 0 ops/s. Example you can see in the issue description: Screenshot from 2023-04-10 10-29-21

Did you verify that each loader is on different DC

Yes, I verified that each DC has one loader.

Regions:

eu-west-1
us-west-2
us-east-1

eu-west-1

< t:2023-04-09 11:16:03,058 f:cluster.py      l:3112 c:sdcm.cluster         p:INFO  > Cluster lwt-longevity-multi-dc-24h-create-l-loader-set-678b7602 (AMI: ['ami-042cf1bc21e30ce60', 'ami-09808043b7fcdb244', 'ami-0e13359ca00856b35'] Type: c5.xlarge): Init nodes
< t:2023-04-09 11:16:03,081 f:common.py       l:407  c:utils                p:DEBUG > Executing in parallel: 'get_instances' on ['eu-west-1']
< t:2023-04-09 11:16:04,394 f:aws_utils.py    l:348  c:sdcm.utils.aws_utils p:DEBUG > [ec2.Instance(id='i-05b539e4405964a89')] Got public ip: 34.240.14.56
< t:2023-04-09 11:16:04,394 f:cluster.py      l:347  c:sdcm.cluster_aws     p:DEBUG > Node lwt-longevity-multi-dc-24h-create-l-loader-node-678b7602-1 [34.240.14.56 | 10.4.2.242] (seed: False): SSH access -> 'ssh
 -i ~/.ssh/scylla-qa-ec2 centos@10.4.2.242'

us-west-2

< t:2023-04-09 11:16:23,997 f:cluster_aws.py  l:145  c:sdcm.cluster_aws     p:DEBUG > Using EC2 service with DC-index: 1, (associated with region: us-west-2)
< t:2023-04-09 11:16:23,997 f:cluster_aws.py  l:147  c:sdcm.cluster_aws     p:DEBUG > Using Availability Zone of: us-west-2a
< t:2023-04-09 11:16:42,123 f:aws_utils.py    l:348  c:sdcm.utils.aws_utils p:DEBUG > [ec2.Instance(id='i-097101ee94e51dd09')] Got public ip: 35.87.130.205
< t:2023-04-09 11:16:42,123 f:cluster.py      l:347  c:sdcm.cluster_aws     p:DEBUG > Node lwt-longevity-multi-dc-24h-create-l-loader-node-678b7602-2 [35.87.130.205 | 10.15.3.177] (seed: False): SSH access -> 'ssh -i ~/.ssh/scylla-qa-ec2 centos@10.15.3.177'

us-east-1

< t:2023-04-09 11:18:49,289 f:cluster_aws.py  l:145  c:sdcm.cluster_aws     p:DEBUG > Using EC2 service with DC-index: 2, (associated with region: us-east-1)
< t:2023-04-09 11:18:49,289 f:cluster_aws.py  l:147  c:sdcm.cluster_aws     p:DEBUG > Using Availability Zone of: us-east-1a
< t:2023-04-09 11:19:07,219 f:aws_utils.py    l:348  c:sdcm.utils.aws_utils p:DEBUG > [ec2.Instance(id='i-024454a504e531d49')] Got public ip: 3.80.162.20
< t:2023-04-09 11:19:07,219 f:cluster.py      l:347  c:sdcm.cluster_aws     p:DEBUG > Node lwt-longevity-multi-dc-24h-create-l-loader-node-678b7602-3 [3.80.162.20 | 10.12.2.218] (seed: False): SSH access -> 'ssh -i ~/.ssh/scylla-qa-ec2 centos@10.12.2.218'

I see that c-s load is started from every loader:

2023-04-09 12:27:27.239: (CassandraStressEvent Severity.NORMAL) period_type=begin event_id=09fec0f1-78fc-4af6-921e-ff284ca1cca5: 
node=Node lwt-longevity-multi-dc-24h-create-l-loader-node-678b7602-1 [34.240.14.56 | 10.4.2.242] (seed: False)

2023-04-09 12:28:16.279: (CassandraStressEvent Severity.NORMAL) period_type=begin event_id=29ac9f4b-8059-45be-8d47-6024f5a20ad8: 
node=Node lwt-longevity-multi-dc-24h-create-l-loader-node-678b7602-3 [3.80.162.20 | 10.12.2.218] (seed: False)

2023-04-09 12:28:34.009: (CassandraStressEvent Severity.NORMAL) period_type=begin event_id=3f99671c-32d3-4e40-b540-fa47a7a98d79: 
node=Node lwt-longevity-multi-dc-24h-create-l-loader-node-678b7602-2 [35.87.130.205 | 10.15.3.177] (seed: False)

and it's actually DC-aware? (I have no idea how it knows).

As @Gor027 mentioned, DCAwareRoundRobinPolicy is default policy for the scylla-tools-driver: https://github.com/scylladb/scylla-tools-java/blob/583261fc0e838e6dcef5331d9dcc18f69f637fe4/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L94

We do not change it. Or I did not find where we change it.

@Gor027 I do not understand: if the load starts from every loader (aka every DC), why the Local DC for every c-s load is always first from regions list - eu-west-1, and not according to the DC that load runs from there. All requests from every DC are sent to eu-west-1? According to DCAwareRoundRobinPolicy if c-s load runs from us-west-2, Local DC is this case should be us-west-2. No? Am I wrong?

But from cassandra-stress log of ran on loader is located in us-west-2 (second in regions list), Local DC is still eu-west-1 (that does not expected):

Connected to cluster: lwt-longevity-multi-dc-24h-create-l-db-cluster-678b7602, max pending requests per connection null, max connections per host 8
WARN  12:28:45,944 Some contact points don't match local data center. Local DC = eu-westscylla_node_west. Non-conforming contact points: /10.15.2.12:9042 (us-west-2scylla_node_west),/10.12.3.169:9042 (us-eastscylla_node_east),/10.15.3.228:9042 (us-west-2scylla_node_west),/10.12.1.4:9042 (us-eastscylla_node_east),/10.15.1.97:9042 (us-west-2scylla_node_west)
Datatacenter: eu-westscylla_node_west; Host: /10.4.0.83; Rack: 1a
Datatacenter: us-west-2scylla_node_west; Host: /10.15.2.12; Rack: 2a
Datatacenter: eu-westscylla_node_west; Host: /10.4.3.132; Rack: 1a
Datatacenter: us-eastscylla_node_east; Host: /10.12.3.169; Rack: 1a
Datatacenter: eu-westscylla_node_west; Host: /10.4.2.42; Rack: 1a
Datatacenter: us-west-2scylla_node_west; Host: /10.15.3.228; Rack: 2a
Datatacenter: us-eastscylla_node_east; Host: /10.12.1.4; Rack: 1a
Datatacenter: us-west-2scylla_node_west; Host: /10.15.1.97; Rack: 2a
Datatacenter: eu-westscylla_node_west; Host: /10.4.0.119; Rack: 1a

fruch commented 1 year ago

@juliayakovlev

I think the region/datacenter need to be pass in the c-s command, like the following:

/cassandra-stress write -node 127.0.0.1 datacenter=dc3

juliayakovlev commented 1 year ago

@juliayakovlev

I think the region/datacenter need to be pass in the c-s command, like the following:
/cassandra-stress write -node 127.0.0.1 datacenter=dc3

How the driver knows about eu-west-1?

fruch commented 1 year ago

@juliayakovlev I think the region/datacenter need to be pass in the c-s command, like the following:
/cassandra-stress write -node 127.0.0.1 datacenter=dc3
How the driver knows about eu-west-1?

if no localDC was passed, the first DC the driver identify would be used:

https://github.com/scylladb/java-driver/blob/b3f3ebaf161b21e5c4840ec294595d4e4b39d9bf/driver-core/src/main/java/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.java#L107

juliayakovlev commented 1 year ago

@juliayakovlev I think the region/datacenter need to be pass in the c-s command, like the following:
/cassandra-stress write -node 127.0.0.1 datacenter=dc3
How the driver knows about eu-west-1?
if no localDC was passed, the first DC the driver identify would be used:

https://github.com/scylladb/java-driver/blob/b3f3ebaf161b21e5c4840ec294595d4e4b39d9bf/driver-core/src/main/java/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.java#L107

So it's SCT issue

Gor027 commented 1 year ago

@Gor027 I do not understand: if the load starts from every loader (aka every DC), why the Local DC for every c-s load is always first from regions list - eu-west-1, and not according to the DC that load runs from there. All requests from every DC are sent to eu-west-1? According to DCAwareRoundRobinPolicy if c-s load runs from us-west-2, Local DC is this case should be us-west-2. No? Am I wrong?

@juliayakovlev If the local datacenter name is not provided to the driver, the DC-aware policy will default to the datacenter of the first node it is connected to (10.4.1.250 is the first in the list, so the datacenter will default to eu-west-1). The list of nodes passed to c-s is passed to the driver in the same order, so even the loader may be located in another datacenter, if the local datacenter is not passed manually, the first host's datacenter will be considered as the local datacenter.

Gor027 commented 1 year ago

As @fruch mentioned, one way is to pass the local datacenter parameter directly: https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/tools/toolsCStress.html#toolsCStress__c-stress-sub-options Another way may be to rearrange the nodes in the c-s command for each loader so that the IP addresses of the nodes in the local datacenter appear first in the list.

juliayakovlev commented 1 year ago

SCT problem. Fixed with https://github.com/scylladb/scylla-cluster-tests/pull/6051 Close

scylladb / java-driver