Closed adamdougal closed 6 years ago
After adding extra logs it looks like there is a race condition causing this to fail:
10:51:39.299 [Test worker] DEBUG uk.sky.cqlmigrate.ClusterHealthTest - Stopping Scassandra
10:51:39.325 [cluster20-reconnection-0] ERROR c.d.driver.core.ControlConnection - [Control connection] Cannot connect to any host, scheduling retry in 1000 milliseconds
10:51:39.325 [cluster20-reconnection-0] DEBUG com.datastax.driver.core.Host.STATES - [Control connection] next reconnection attempt in 1000 ms
10:51:39.330 [cluster20-nio-worker-3] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:37299-4, inFlight=0, closed=true] closing connection
10:51:39.334 [Test worker] INFO o.scassandra.server.ServerStubRunner - Server is shut down
10:51:39.334 [Test worker] DEBUG uk.sky.cqlmigrate.ClusterHealthTest - Stopped Scassandra
10:51:39.334 [Test worker] DEBUG uk.sky.cqlmigrate.ClusterHealthTest - Checking Cassandra
10:51:39.336 [Test worker] DEBUG uk.sky.cqlmigrate.ClusterHealth - Cassandra hosts: [localhost/127.0.0.1:37299]
10:51:39.352 [Test worker] DEBUG uk.sky.cqlmigrate.ClusterHealth - All Cassandra hosts healthy
10:51:39.352 [Test worker] DEBUG uk.sky.cqlmigrate.ClusterHealthTest - Checked Cassandra
10:51:39.352 [cluster20-worker-0] DEBUG com.datastax.driver.core.Host.STATES - [localhost/127.0.0.1:37299] marking host DOWN
Even though Scassandra has been stopped and the Cassandra driver is attempting to reconnected, it looks like the state isn't updated until after we've checked. As there's no other way of knowing the state of a node my proposed fix is to retry our check multiple times or wait before checking.
Builds are currently intermittently failing due to:
https://travis-ci.org/sky-uk/cqlmigrate/builds/279515435#L1065