Open dkropachev opened 3 months ago
@dkropachev I created a PR for things I discover when testing first scenario, but second scenario is impossible to use because when I try to start single node cluster with join_ring=false I have an error: ERROR 2024-10-30 11:37:46,746 [shard 0:main] init - Startup failed: std::runtime_error (Cannot start the first node in the cluster as zero-token)
@dkropachev I created a PR for things I discover when testing first scenario, but second scenario is impossible to use because when I try to start single node cluster with join_ring=false I have an error:
ERROR 2024-10-30 11:37:46,746 [shard 0:main] init - Startup failed: std::runtime_error (Cannot start the first node in the cluster as zero-token)
Thanks, it looks like it is imposible, let's focuse then on zero-token DC case
In cases when zero-token DC is targeted queries suppose to fail with no host available error
@dkropachev Is is ok if it fails with error like this: 2024/11/05 12:44:16 Unable to connect to cluster: gocql: unable to create session: gocql: datacenter datacenter2 in the policy was not found in the topology - probable DC aware policy misconfiguration
when using DCAwareRoundRobinPolicy(zero_token_database)
. Except for that I didn't find any incorrect behavior related to zero-token nodes.
@dkropachev ping
@sylwiaszunejko , It needs some context, but I am looking for following scenarios
datacenter2
is a zero-token datacenter
target host - host you feed to NewCluster
target dc - dc name you feed to DCAwareRoundRobinPolicy
target host
= any host from datacenter1
, target dc
= datacenter1
. It should succeed, you should be able to execute queriestarget host
= any host from datacenter2
, target dc
= datacenter1
. It should succeed, you should be able to execute queriestarget host
= any host from datacenter1
, target dc
= datacenter2
. It should fail with same error you have providedtarget host
= any host from datacenter2
, target dc
= datacenter2
. It should fail with same error you have providedLet's make sure we add a unit test for it.
PR#19684 brings possibility of having nodes coordinator-only nodes (or zero-token nodes). These types of nodes are going to be supported only in RAFT.
Such nodes, despite being registered in the cluster, do not handle any queries and should be excluded from query routing. This feature is already present in cassandra, but not merged into scylla yet, so we might want to start testing it on our drivers with cassandra first.
Difference between cassandra and scylla implementation
Major difference is that these nodes are absent from
system.peers
andsystem.peers_v2
in cassandra, while in scylla implementation these nodes are going to be present there.Due to this fact we will need to test Apache and datastax drivers against
scylla
as well.Approx. Testing plan
Regular cluster
join_ring
tofalse
in it's configuration, or adding-Dcassandra.join_ring=false
to cli (cassandra only).zero-token
node does not participate in the routingzero-token
nodezero-token
node presence.Cluster that starts with zero-token node(DROPPED)no host available
error.Zero-token Datacenter
Repeat this scenario for following policies:
DCAwareRoundRobinPolicy
TokenAwareHostPolicy(DCAwareRoundRobinPolicy())
TokenAwareHostPolicy(RoundRobinHostPolicy())
For
DCAwareRoundRobinPolicy
use three variants:Steps:
join_ring=false
modepolicy
to make sure that driver session is created and every query is being scheduled to regular nodes and executed successfully. In cases when zero-token DC is targeted queries suppose to fail withno host available
errorLinks
Original umbrella issue in
scylladb/scylladb
repo: https://github.com/scylladb/scylladb/issues/19693 Core issue to bringjoin_ring
option into scylla: https://github.com/scylladb/scylladb/issues/6527 PR that brings this feature in https://github.com/scylladb/scylladb/pull/19684