jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

Investigate whether Cassandra nodes need stable IP addresses #319

Closed wallrj closed 6 years ago

wallrj commented 6 years ago

It's unclear how Cassandra > v3 handles node IP address changes.

There are discussions in Sticky IPs for StatefulSet which suggest that Cassandra does need "sticky IP addresses". But it's not clear if that applies to v2 or v3 Cassandra (or both).

On the other hand, there is cassandra-kubernetes-hostid written as part of Improve Cassandra Example , which suggests that you can supply the "hostid" to -Dcassandra.replace_address when starting the C* node. The "hostid" seems to be a UUID chosen by Cassandra when it first creates its data store and which is the identity of this node in the cluster, rather than the IP address.

The best practices for Cassandra state that you should use local storage physically attached to the machine. However, if this machine fails and data is lost then you'll need to provision a replacement node. The method for doing this is to pass the IP or host UUID via the JVM flag: -Dcassandra.replace_address=e2a79390-c458-4387-8034-c14da6d38a22. Because IPs in Kubernetes are not stable the host ID is the only way to retain identity after data loss. This tool will persist or read the host ID using the annotations of the Kubernetes API.

There is a replace_address_first_boot option described in Bootstrapping Apache Cassandra Nodes which sounds like it is probably the correct option to use.

Some documentation talks about supplying an IP address to replace_address:

Other documentation talks about supplying the listen_address to replace_address:

And the documentation for listen_address talks about Ip address or hostname:

So perhaps we can simply supply the StatefulSet stable hostname to listen_address and that will be sufficient?

Other useful links:

/kind bug