thibaultcha / lua-cassandra

Pure Lua driver for Apache Cassandra
https://thibaultcha.github.io/lua-cassandra
Other
98 stars 35 forks source link

fix(cluster) use 'listen_address' for contact point in refresh() #118

Closed thibaultcha closed 6 years ago

thibaultcha commented 6 years ago

Previously, using coordinator.host to add the contact point to the LB policy means that if the user specified a hostname, then it would be used to index this node instead of the IP address. Nothing harmful in that except some inconsistent log messages (sometimes an IP address shows up, other times a hostname).

Problem

An issue arises however when:

  1. Several Cluster instances call :refresh() on the same C* cluster
  2. DNS round-robin is in effect for the contact point hostnames

Let's consider clusterA and clusterB, both instances of the Cluster module. Let's also consider the following C* cluster:

10.16.0.1 node1
10.16.0.2 node2

And the following DNS record:

cassandra.default.svc.cluster.local. 30    IN A    10.16.0.1
cassandra.default.svc.cluster.local. 30    IN A    10.16.0.2

First, clusterA calls refresh(), with contact_points = { "cassandra" }, and as a result inserts the following topology in the cluster's shm:

cassandra:[peer info]
10.16.0.2:[peer info]

Its LB policy now has 2 entries: cassandra and 10.16.0.2.

Then, clusterB calls refresh() as well, with the same contact_points option, and as a result first purges the cluster's shm content, before inserting the following:

10.16.0.1:[peer info]
cassandra:[peer info]

Note that because of the round-robin DNS resolution, cassandra pointed to 10.16.0.2 this time.

Now, when clusterA will invoke its LB policy to elect a peer for a given query, it will eventually look for 10.16.0.2. However, such an entry does not exist in the cluster's shm anymore. Therefore, the following error is returned:

no host details for 10.16.0.2

Proposed solution

By replacing the cache key of the peer's info in the shm from the specified contact_point value (which is the user's input), to the listen_address column of the system.local table, do not store hosts details by hostname anymore.

This has the added benefit of ensuring all logs and other operations done by the Cluster module are always using the IP address of the node.