thibaultcha / lua-cassandra

Pure Lua driver for Apache Cassandra
https://thibaultcha.github.io/lua-cassandra
Other
98 stars 35 forks source link

Cluster connecting to a node out of cluster #101

Closed kushalkh closed 5 years ago

kushalkh commented 7 years ago

Hi,

I'm experiencing a really strange issue with cluster solution. It works fine more than 90% of times but once every 15-20 seconds it throws out lot of errors for a duration of 1-2 seconds.

Error says this : ,all hosts tried for query failed. aaa.bbb.ccc.ddd: host still considered down

"aaa.bbb.ccc.ddd" is an IP which is not even part of the cluster or contact_points provided when connecting. I also ran nodetool status and saw all the IPs and it shows correct IPs only.

Just to give you a little info on "aaa.bbb.ccc.ddd", this IP is not from the same datacenter and is actually used to resolve wildcard cnames for our domain. I can't find this IP or even any localhost IP anywhere in the cassandra setup. I only provide a list of 6 IPs in the contact_points which are all from same datacenter.

Here is what my setup looks like: nginx.conf local my_module = require "cassandra_module" my_module.init_cluster { shm = "cassandra", -- defined in http block contact_points = {"a", "b", "c", "d", "e", "f"}, keyspace = "cookie" }

a, b, c, d, e, f obviously being the IPs. cassandra_module is exact copy of my_module.lua specified in cluster example (http://thibaultcha.github.io/lua-cassandra/examples/intro.lua.html).

myfile.lua local my_module = require 'cassandra_module' if my_module then local cookieDataRows, err = my_module.execute("SELECT id FROM my_table where aid = ? limit 1", {aid})
if err ~= nil then ngx.say(err) -- Error specified above occurs here whenever it happens else ngx.say("success") end end

Please let me know if any other info is required here.

Thanks much

thibaultcha commented 7 years ago

Hi,

The contact points is not the end list of nodes that will be used by the cluster module. The usual use-case is to put only a few nodes in this list (they are, after all, only "contact points"), and let the cluster module discover what the other nodes are.

It does so by performing this query:

SELECT * FROM system.peers;

If you start a cqlsh session and run this query, you will very probably see this "aaa.bbb.ccc.ddd" peer in there. If it is not part of your cluster, then something could have gone with a decommissioned node that ended up being left there, maybe?

kushalkh commented 7 years ago

Hi @thibaultcha ,

This is exactly what i thought as well. On running that query i see all the nodes and they all look good, i don't see "aaa.bbb.ccc.ddd" anywhere. Also, this is not a decommissioned IP, instead, our domain resolves to this IP in our Geo DNS setup and has always been setup this way.

Any other places you can recommend i should be checking? I'm not even using localhost as one of the nodes, it's all specific IPs.

thibaultcha commented 5 years ago

Unsure what could cause this, but since this was opened, we switch to use the rpc_address column to contact peers. Closing this and considering it stale, feel free to reopen another issue in the future.