Previously, using coordinator.host to add the contact point to the
LB policy means that if the user specified a hostname, then it would be
used to index this node instead of the IP address. Nothing harmful in
that except some inconsistent log messages (sometimes an IP address
shows up, other times a hostname).
Problem
An issue arises however when:
Several Cluster instances call :refresh() on the same C* cluster
DNS round-robin is in effect for the contact point hostnames
Let's consider clusterA and clusterB, both instances of the Cluster
module. Let's also consider the following C* cluster:
10.16.0.1 node1
10.16.0.2 node2
And the following DNS record:
cassandra.default.svc.cluster.local. 30 IN A 10.16.0.1
cassandra.default.svc.cluster.local. 30 IN A 10.16.0.2
First, clusterA calls refresh(), with contact_points = { "cassandra" }, and as a result inserts the following topology in the cluster's shm:
cassandra:[peer info]
10.16.0.2:[peer info]
Its LB policy now has 2 entries: cassandra and 10.16.0.2.
Then, clusterB calls refresh() as well, with the same contact_points
option, and as a result first purges the cluster's shm content, before
inserting the following:
10.16.0.1:[peer info]
cassandra:[peer info]
Note that because of the round-robin DNS resolution, cassandra pointed
to 10.16.0.2 this time.
Now, when clusterA will invoke its LB policy to elect a peer for a given
query, it will eventually look for 10.16.0.2. However, such an entry
does not exist in the cluster's shm anymore. Therefore, the following
error is returned:
no host details for 10.16.0.2
Proposed solution
By replacing the cache key of the peer's info in the shm from the
specified contact_point value (which is the user's input), to the
listen_address column of the system.local table, do not store hosts
details by hostname anymore.
This has the added benefit of ensuring all logs and other operations
done by the Cluster module are always using the IP address of the node.
Previously, using
coordinator.host
to add the contact point to the LB policy means that if the user specified a hostname, then it would be used to index this node instead of the IP address. Nothing harmful in that except some inconsistent log messages (sometimes an IP address shows up, other times a hostname).Problem
An issue arises however when:
:refresh()
on the same C* clusterLet's consider clusterA and clusterB, both instances of the Cluster module. Let's also consider the following C* cluster:
And the following DNS record:
First, clusterA calls
refresh()
, withcontact_points = { "cassandra" }
, and as a result inserts the following topology in the cluster's shm:Its LB policy now has 2 entries:
cassandra
and10.16.0.2
.Then, clusterB calls
refresh()
as well, with the samecontact_points
option, and as a result first purges the cluster's shm content, before inserting the following:Note that because of the round-robin DNS resolution,
cassandra
pointed to10.16.0.2
this time.Now, when clusterA will invoke its LB policy to elect a peer for a given query, it will eventually look for
10.16.0.2
. However, such an entry does not exist in the cluster's shm anymore. Therefore, the following error is returned:Proposed solution
By replacing the cache key of the peer's info in the shm from the specified
contact_point
value (which is the user's input), to thelisten_address
column of thesystem.local
table, do not store hosts details by hostname anymore.This has the added benefit of ensuring all logs and other operations done by the Cluster module are always using the IP address of the node.