minReadySeconds with Prometheus & Alertmanager Custom Resource

jverhounik commented 5 months ago

What happened?

Description

I have configured the minReadySeconds parameter for prometheus / alertmanager CRs deployed by prometheus-operator. I use the kube-prometheus-stack [59.1.0] helm chart for the deployment of the prometheus-operator.

I have adjusted the value like so: minReadySeconds: 300

Steps to Reproduce

Deploy the prometheus-operator via the kube-prometheus-stack and configure the minReadySeconds parameter.

Expected Result

After a kubectl rollout restart sts prometheus I expect the rolling restart process to halt for 5 min before restarting the next prometheus replica.

Actual Result

The minReadySeconds parameter is not respected. Once the first replica is ready the second replica is immediately restarted.

prometheus-mmop-kube-prometheus-stack-prometheus-0 2/3 Running 0 50s prometheus-mmop-kube-prometheus-stack-prometheus-1 3/3 Running 0 116s

I figure this could be due to the prometheus / alertmanager statefulsets being configured with the podManagementPolicy "Parallel".

I come to this conclusion because I have rendered out the statefulset manifests and discovered that the podManagementPolicy is not set to the default "OrderedReady".

spec: persistentVolumeClaimRetentionPolicy: whenDeleted: Retain whenScaled: Retain podManagementPolicy: Parallel replicas: 2 revisionHistoryLimit: 10

I tested the behaviour by creating a dummy-statefulset which first was deployed without the podManagementPolicy and finally with the podManagementPolicy set to "Parallel".

I used this official kubernetes doc statefulset manifest.

Here is the result of the rollout restart performed on the dummy-statefulset without the configured podManagementPolicy:

web-0 1/1 Running 0 9s web-1 1/1 Running 0 10m

Here the result if you have the podManagementPolicy set to parallel:

web-0 1/1 Running 0 45s web-1 1/1 Running 0 53s

You can notice a difference in pod age between both replicas in the first output. In the second output however the pod age tells us that the minReadySeconds wait time was not enforced.

Prometheus Operator Version

0.74.0

Kubernetes Version

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.8

Kubernetes Cluster Type

Other (please comment)

How did you deploy Prometheus-Operator?

helm chart:prometheus-community/kube-prometheus-stack

Manifests

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 2
  podManagementPolicy: Parallel
  minReadySeconds: 300
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: registry.k8s.io/nginx-slim:0.21
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
        resources:
          limits:
            cpu: 100m
            memory: 150Mi
          requests:
            cpu: 100m
            memory: 150Mi
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
          storage: 1Gi

prometheus-operator log output

Anything else?

I am aware of the possibility to use the strategic-merge-patch to apply probing configuration (readiness, liveness) however this solution is not suitable for our prometheus-operator / kube-prometheus-stack setup.

jverhounik commented 5 months ago

It appears that there is a similar issue opened at kubernetes => https://github.com/kubernetes/kubernetes/issues/119234

jverhounik commented 5 months ago

Kubernetes Cluster Type => Vanilla

prometheus-operator / prometheus-operator