watch_log_for_death scans node logs looking for a message that says "<nodeaddress> is now DOWN". When this message appears in the logs of other nodes we can be certain that this particular node is now DOWN.
In Scylla the messages looks like this:
127.0.0.1 is now DOWN
127.0.0.1 is now UP
But in Cassandra 4.1.3 the messages are a bit different:
127.0.0.1:7000 is now DOWN
127.0.0.1:7000 is now UP
In Cassandra the node's address also includes the port. watch_log_for_death didn't handle this properly - the regex expected an ip address and then " is now DOWN".
Because of this it wasn't able to detect the message and node.stop() kept timing out for Cassandra nodes.
To fix it let's generalize the regex so that it handles both of the messages properly.
The regex is now pretty much the same as that in watch_log_for_alive, which looks for is now UP messages. It's located a few lines below watch_for_log_death.
watch_log_for_death
scans node logs looking for a message that says"<nodeaddress> is now DOWN"
. When this message appears in the logs of other nodes we can be certain that this particular node is now DOWN.In
Scylla
the messages looks like this:But in
Cassandra 4.1.3
the messages are a bit different:In Cassandra the node's address also includes the port.
watch_log_for_death
didn't handle this properly - the regex expected an ip address and then " is now DOWN". Because of this it wasn't able to detect the message andnode.stop()
kept timing out for Cassandra nodes.To fix it let's generalize the regex so that it handles both of the messages properly. The regex is now pretty much the same as that in
watch_log_for_alive
, which looks foris now UP
messages. It's located a few lines belowwatch_for_log_death
.