cql-rb discovers a removed node and fails to connect to cluster!

thedebugger commented 10 years ago

I recently removed a node from our cassandra cluster, and after that when I'm trying to connect my app, which uses cql-rb, to cassandra cluster it fails to connect. It discovers the removed node (not sure how) and throws an un-handled exception "connection refused" on the removed node however it is able to connect to other nodes just logs.

I've checked on every cassandra node in the cluster that removed node is not present by running "nodetool status". I've also checked that I'm not passing this host in the host array to cql-rb at the time of connecting. Another app which uses datastax java driver is able to connect and seems to be working fine.

Environment:

Cassandra 2.0.1 running on 5 nodes
cql-rb 1.2.0 on jRuby 1.7.8

I'm planning on upgrading to 2.0.6 as 2.0.1 has quite lot of bugs. Neither, our cassandra cluster is in happy state - couple of nodes see each other DOWN. i'm hoping that will get fixed once i upgrade.

So i've couple of question

How does discovery works?
Isn't it okay to handle the exception if you can connect to one node or more?
Suggestions on how can I fix it?

Let me know if you need more details.

thedebugger commented 10 years ago

Sorry, I was incorrect; cql-rb is able to connect just fine to the cluster. The error is logged as WARN and is able to connect to cluster. The app was failing for a different reason. Anyways, I you have spare time then I'd like to know how discovery works.

iconara commented 10 years ago

Sounds like you're running into CASSANDRA-6053. Upgrade to 2.0.6 and it will go away. You can also scrub the system.peers table manually if you're feeling daring.

iconara commented 10 years ago

The connection and peer discovery flow looks like this:

In parallel open a connection to each of the seed nodes (not cluster seeds, but the host names given in the :hosts option) – and if the :connections_per_node is specified this is done multiple times per node.
For each connection (still in parallel) run through the connection setup, which means asking the server for what features it supports, going through authentication, sending a startup message with the features we want to use, loading metadata about the node, and changing keyspace (if specified).
If at least one connection succeeded at this point, go ahead, otherwise fail.
Pick a connection at random and run SELECT * FROM system.peers (not actually *, but I can't remember the names of the columns off the top of my head), remove the nodes we're already connected to (i.e. that were seed nodes).
Continue to filter the list of additional nodes by data center, so that we only connect to nodes in the same data center(s) as the seed nodes.
If there are any nodes left, start a new connection round going through most of the steps above.
If any of the additional connection attempts fail, ignore that.
When all additional nodes have either successfully been connected to or failed we return from the Client.connect call.

It's the most complicated part of the whole driver, by far. There's lots of things going on, all in parallel and asychronously.

In addition, this is how the driver manages to stay up when nodes go down:

The first connection (this can be any connection, it's just which one happened to be added to the list of connections first) is used to register for notifications of nodes being added to the cluster or coming back from being down. This is a very handy feature of Cassandra and the CQL protocol.
If that connection closes for some reason, the new first node is used to register again, so that there is always one connection that is listening for notifications.
When a notification is received we run a new peer discovery round and connect to any nodes we're not already connected to.
If a new node doesn't respond we schedule another peer discovery a few seconds into the future, sometimes the notification comes before the node accepts new connections.
If no new nodes were found in the peer discovery we don't do anything.

This mechanism makes it possible (with some application error handling) to have the application stay up during a rolling cluster restart. I've upgraded a four node C* cluster while a distributed application was sending tens of thousands of operations per second to it. It's like changing the engines on a plane, in flight.

thedebugger commented 10 years ago

awesome. thanks for writing it down. I think I'll do the upgrade rather then changing that table.

iconara / cql-rb

cql-rb discovers a removed node and fails to connect to cluster! #88