Open viossat opened 6 years ago
We use rpc_address which is for client connections. In comparison, broadcast_address is used for inter-nodes communication. This driver aims at a similar behavior as other Datastax-maintained drivers, which use rpc_address
as well.
Hello,
I have an issue that seems related to this one: since I have upgraded to 0.14.1, kong does not start, I got the following error: [cassandra error] could not retrieve current migrations: [cassandra error] all hosts tried for query failed. 172.17.0.4: host seems unhealthy, considering it down (timeout)
. I only have a single cassandra node at the moment.
Where 172.17.0.4
is the internal IP of my cassandra
container. So my kong
container can't reach cassandra.
In addition, kong migration up
works, certainly because cassandra driver is used in a different way for migration. (maybe this will help you).
@nmeylan What is the value configured for rpc_address in your cluster's nodes?
rpc_address is set at 0.0.0.0
.
I tried with kong 1.0, and it "works" as now I go into this https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L501
However, I still don't understand why log level warn
is used when I fall into this condition. It does not seems to be wrong to use 0.0.0.0
as rpc_address
, Per Datastax documentation:
rpc_address:
- 0.0.0.0: Listens on all configured interfaces. You must set the broadcast_rpc_address to a value other than 0.0.0.0
BTW, thanks for your answer
My case was about Cassandra in containers as well. rpc_address
is set to 0.0.0.0
. The Lua driver fetches the local Docker IP of the local peer (coordinator) which is a private and usually unreachable IP.
The JS driver doesn't follow this behavior.
Please note that changed in 1.3.3 c176a9ccc3c84786088c1ec157a275b924e7e1b1
The JS driver doesn't follow this behavior.
How so? The JS driver uses rpc_address
unless it is specified as the bind all address (0.0.0.0
), in which case it will fallback to peer
(listen_address
) (source: https://github.com/datastax/nodejs-driver/blob/580312937e07368d328f0a9aae2029609b307cdd/lib/control-connection.js#L740-L753), just like we do in lua-cassandra (source: https://github.com/thibaultcha/lua-cassandra/commit/c176a9ccc3c84786088c1ec157a275b924e7e1b1#diff-745de875635774250552720721ac4fd5R524). The default address translator in the JS driver does not do anything to the received address: https://github.com/datastax/nodejs-driver/blob/707953f52709fc507f07edb18659f2e2d9f4d22e/lib/policies/address-resolution.js#L47.
@viossat How are you connecting your containers (Kong <-> Cassandra) together? Are you linking them, or using Docker a network?
getAddressForPeerHost
is used to get the addresses of the remote peers only, the JS driver uses the contact point as the local peer (coordinator) IP. It uses the system.peers
table to find extra contact points.
https://github.com/datastax/nodejs-driver/blob/580312937e07368d328f0a9aae2029609b307cdd/lib/control-connection.js#L120-L154
My Cassandra instances are located in multiple datacenters and sub-networks, same for the clients.
@viossat So, if I understand you correctly, your complaint is that the Lua driver overrides the specified contact_point
argument with said contact point's rpc_address
, instead of sticking to the given contact_point
argument? (source: https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L494-L502). I was under the impression that your complaint was related to how the driver contacted both the local contact point and the peers.
Help me understand: why would only your contact point be reachable by the the driver (or Kong, for that is?). Since the driver load balances across all peers (including the contact point), it treats both the local contact point and the peers the same way, and assumes that all of them can be contacted by their rpc_address
(falling back to the listen_address
if the former is set to "bind all"). Can you clarify this point for me?
Sorry, my issue was not clear and didn't take 1.3.3 changes into account.
The JS driver from DataStax works as follow:
system.local.*_address
)system.peers
to discover more peersWith the Lua driver:
system.local.listen_address
is used as IP of the first responding contact point, that was my original issuesystem.local.rpc_address
is used as IP of the first responding contact point, raises a warning and fallbacks to the contact point host if equals to 0.0.0.0
When working with Cassandra in container:
system.local.rpc_address
is equals to 0.0.0.0
(so the warning is always raised and a domain may be used as host)listen_address
of the Cassandra instances are usually unreachable from the client (no effect since 1.3.3)(before 1.3.3) While most of Cassandra clusters have more than one node, someone may not notice the issue as the first responding contact point returns the others peers before its (unreachable) listen_address
is saved in the peer list.
Reopening this since the discussion is active.
domain contact points are not resolved manually, so extra A/AAAA DNS records are ignored
This library does not embed a DNS resolver, so contact points must be specified as IP addresses. Kong uses a clever hack that makes it possible to pass contact points as hostnames.
raises a warning and fallbacks to the contact point host if equals to 0.0.0.0
Similarly to the JS driver, iirc.
when it fallbacks to the contact point host, that may be a domain with multiple A/AAAA records which may lead to another peer from time to time
Since we only refer to contact points by IP address (from rpc_address
or listen_address
, as you pointed out, that shouldn't be the case? This is especially critical for the reasons that initially triggered these recent changes, described in great details in this commit (which was wrongly using listen_address
, as fixed in a subsequent commit).
so the warning is always raised and a domain may be used as host
I have considered making this warning silent when opts.silent is enabled. A PR here is welcome.
listen_address of the Cassandra instances are usually unreachable from the client
Which property do you suggest we fallback to then?
1.3.3 works in my case. imo the saved IP of the local peer should always be the contact point, not rpc_address
when available.
An DNS resolver would be awesome.
Absent DNS capabilities in the driver, broadcast_address
is probably the best option. That appears to be what the DataStax drivers effectively do, as the listen address they refer to (the peer
column from system.peers
) corresponds to broadcast_address
in system.local
.
rpc_address
in system.peers
appears to actually be broadcast_rpc_address
as of Cassandra 2.1--it looks like the DataStax driver's fallback mechanism was designed for and only applies to older versions, as broadcast_rpc_address
is never 0.0.0.0. While it's the value we actually want, it doesn't appear to be exposed anywhere for the local node (system.peers
only includes other nodes' info). broadcast_address
is the next best guess even though it won't be correct if you use separate addresses for CQL and gossip listens (hopefully uncommon).
https://github.com/thibaultcha/lua-cassandra/blob/1.3.4/lib/resty/cassandra/cluster.lua#L477 will retrieve the coordinator hostname, so it can wind up in the state described in https://github.com/thibaultcha/lua-cassandra/commit/8b9fcd967bb23469026b91b5f8a6447a096fe75c#diff-745de875635774250552720721ac4fd5L496
@rainest Would you be willing to send a PR to this repository? In the same vein, we could also PR Kong to resolve its provide contact points before instantiating the cluster instance.
On second thought, I wonder if we shouldn't just follow my second suggestion, but keep this driver as-is for now (so as to stay aligned to the Datastax implementations)?
Here is the Kong PR: https://github.com/Kong/kong/pull/4296
Note that this fix is only relevant in versions of Kong that create several resty.cassandra.cluster
instances (as described in the aforementioned commit), so that includes Kong up to 0.14, and Kong Enterprise up to 0.34 (today's latest release).
I think
broadcast_address
is more suitable asrpc_address
is usually set to0.0.0.0
or a local sub-network IP.