spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.49k stars 356 forks source link

upgrade redisfailover cluster #637

Closed matzik12 closed 11 months ago

matzik12 commented 11 months ago

I want to upgrade a redis image from redis 6.2.12 to 7.0.12 without affecting the data in it. What do i need to do? Just switch the image in the CR and let the operator do its thing?

And the question is for every image not only redis image (I would like to upgrade my operator image/sentinel image)

hashemi-soroush commented 11 months ago

The short answer is no, although it not very hard to implement it. If you're interested in knowing how, read on. Since it's a bit long, I've divided it into sections to be more approachable.

Redis recommended upgrade process

There are two ways you can repopulate a Redis instance:

  1. the files Redis writes on storage (whether it's a dump or an AOF)
  2. the replication protocol which replicas use to sync with the master

According to redis.io, as a general rule, the persistence format and the replication protocol might have breaking changes in minor and major version upgrades. Nevertheless, the recommended safest upgrade process by redis.io relies on the replication protocol. The recommended process is upgrading the replicas one by one, waiting for each to finish its initial synchronization with the master. When all the replicas are upgraded, perform a manual failover and upgrade one of the replicas to master, then upgrade the old master.

Running the process manually in Kubernetes

In Kubernetes, you can change the image of the Statefulset, which in turn updates the pods with an update strategy you choose. Statefulset supports a rolling update strategy that can be configured to update the pods one by one automatically, waiting for each new pod to get "ready" before updating the next one. It uses the pod's readiness probe to check if it's ready. So, you can simply write a custom readiness probe for your Redis pods which checks whether it is synced with the master. This way, you can delegate the upgrade process entirely to Kubernetes.

The only problem is that Kubernetes doesn't care which pod is the master. It might choose the master first. Since we are using Sentinel and since Statefulsets terminate pods gracefully, the master will have time to alert the Sentinels and hopefully, they will have enough time to elect a new master before the grace period of the master ends, so we could have a downtime-less update. So, it would be fine as long as the master has a reasonable grace period, which can be set through terminationGracePeriodSeconds field of pods.

What does the operator do?

Currently, the operator has exactly the readiness probe this feature needs. The termination grace period is also set to 30 seconds by default and can be configured through the Redisfailover CR. The only mismatch from our list of requirements is the update strategy which is set to OnDelete and is not configurable. I'm not sure why and what will break if we change it.

samof76 commented 11 months ago

The only mismatch from our list of requirements is the update strategy which is set to OnDelete and is not configurable. I'm not sure why and what will break if we change it.

Continuing from where @hashemi-soroush left of, the operator does the right thing of deleting the replica's first and then when they are sync complete with the current master, the operator proceeds to delete the master, hence providing with seamless rollout of the redis.

matzik12 commented 11 months ago

Thank you for commenting and helping :) @samof76 @hashemi-soroush Just one thing missing to me, what about upgrading the sentinel image? is it the same ? And I think as a huge step to improve the operator and the community of it, there should be detailed documentation for those kind of operations.

hashemi-soroush commented 11 months ago

The Sentinel upgrade is much simpler. Since Sentinels don't have persistent data and communicate with each other through Redis master, you can simply change the image of the Sentinel in the Redisfailover CR and the operator takes care of upgrading and connecting it to the master.

As for documentation, I agree. Such a useful project can really help its user base with a more comprehensive documentation, though the code is very cleverly clean and short, so in lack of documentation, people can find the answer to their questions fairly easily. That said, I'm sure the maintainers and the community would appreciate it if each of us spend just a little time to document the whole thing.

P.S.: if your issue is resolved, please close it.