kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.38k stars 39.47k forks source link

StatefulSet with podManagementPolicy=OrderedReady and minReadySeconds does not scale down correctly #123918

Open atiratree opened 7 months ago

atiratree commented 7 months ago

What happened?

  1. create the OrderedReady statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx-roll
spec:
  replicas: 2
  minReadySeconds: 30
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx-roll
  template:
    metadata:
      labels:
        app: nginx-roll
    spec:
      containers:
      - name: nginx
        image: ghcr.io/nginxinc/nginx-unprivileged:latest
        ports:
        - containerPort: 80
          name: web
  1. wait for the 2nd pod to become ready, but not available (can also happen when the pod has lost availability)
  2. scale the statefulset to 1 replica
kubectl scale statefulset nginx-roll --replicas=1
  1. the 2nd pod should start terminating immediately but instead it hangs until the KCM is fully resynced

What did you expect to happen?

the 2nd pod should start terminating immediately

How can we reproduce it (as minimally and precisely as possible)?

see What happened?

Anything else we need to know?

we can see in the logs that the syncs are happening but the firstUnhealthyPod variable is not resolved, so the progress is stalled

stateful_set_control.go:509] "StatefulSet is waiting for Pod to be Available prior to scale down" statefulSet="test/nginx-roll" pod=""

We even get the availability check scheduled and called, but because a prior scheduling already ocurred, the new check is thrown away (should not happen, but a similar issue was reported in https://github.com/kubernetes/kubernetes/issues/119352 and a fix is blocked on https://github.com/kubernetes/kubernetes/pull/112328)

stateful_set.go:243] "StatefulSet will be enqueued after minReadySeconds for availability check" statefulSet="test/nginx-roll" minReadySeconds=30

the sync will be called too soon and resolves again in

stateful_set_control.go:509] "StatefulSet is waiting for Pod to be Available prior to scale down" statefulSet="test/nginx-roll" pod=""

The next sync will ocurr when KCM is fully resynced which can be a long time (depending on KCM resync period)

statefulset controller should scale down the first condemned pod as soon as it can, but keep the predecessor pods running available

Kubernetes version

```console $ kubectl version Client Version: v1.29.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.0-rc.1.3813+a0beecc776d492-dirty ```

Cloud provider

NA

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

atiratree commented 7 months ago

/sig apps /triage accepted /priority important-longterm