pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.24k stars 499 forks source link

Add pod in-place feature. #3287

Open mikechengwei opened 4 years ago

mikechengwei commented 4 years ago

Feature Request

Describe the feature you'd like: When the user only changes the image field, operator only needs to recreate the container instead of recreating the pod.

Describe alternatives you've considered: When we update image field , kubelet will find the hash change of the container image , it will kill current conatiner, and start container with the new image.

So kubernetes only support image field in-place update now ,and I find 'Pod resource requests & limits to be updated in-place' is developing. We can support image in-place updated firstly.

I considered the following:

  1. Support InPlaceIfPossible strategy in advanced statefulset.
  2. When enable InPlaceIfPossible strategy , operator will calculate in-place update spec and determine whether to delete pod or patch pod with new image value.
  3. We need to make sure the pod status is correct and smooth upgrade, when the update starts, we need to offline node in tidb cluster and change the pod status to NotReady . When the container image equals to new image value and node joined cluster, we need to change the pods status to Ready. We can use 'Readiness-gate Feature'

Thats all.

reference:

DanielZhangQD commented 4 years ago

cc @cvvz

cvvz commented 4 years ago

/assign

cvvz commented 4 years ago

@mikechengwei @DanielZhangQD As @mikechengwei mentioned, we should use readinessGates to make sure Pod won't be Ready before inplace-update finished. I think we need to add readinessGates in TidbCluster spec, when InPlaceIfPossible or InPlaceOnly is set in TidbCluster, the InPlaceUpdateReady must be set in readinessGates at the same time. There is no need to delete InPlaceUpdateReady from readinessGates when InPlaceIfPossible or InPlaceOnly is changed to Recreate. But changing the definition of readinessGates will cause rollingupdate immediately. WDYT?

mikechengwei commented 4 years ago

After changing the strategy, I think the pod should be restarted to update the spec and remove readinessGates field. Because readinessGates just work in InPlace strategy. And turn into Inplace strategy, the pod should also be updated on a rolling basis.