Closed adejanovski closed 6 days ago
Currently:
Conditions to allow fast lane startups:
Conditions to disallow fast lane startups:
Things to verify:
Technical aspects How do we ensure a node was part of the ring before?
The pod name appears in the cassdc .status.nodeStatuses struct with a host ID. cass-operator needs to ensure the entry is added only for nodes that have successfully completed bootstrap (their state is UN). nodeStatuses has to be the perfect representation of the DC topology. Any node removal should reflect there after a scale in/down operation. This can be detected with the LEAVING/REMOVED states in the endpoint state.
There can be cases where a scale up operation would be blocked by another crashlooping pod. In this case the new pod will have the "Starting" label, which prevents the pre-existing pod to come back up after a fix is applied (for example when it's rescheduled on a new worker).
One way we could solve this would be to allow pods to start right away if they host Cassandra nodes that have already bootstrapped in the past, and if they're not part of a replacement. This way we could have faster startups overall while still protecting ourselves from concurrent bootstraps.