spotify / flink-on-k8s-operator

Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
188 stars 68 forks source link

poddisruptionbudget is not allowing any disruptions #675

Open balonik opened 1 year ago

balonik commented 1 year ago

I don't know the intentions of https://github.com/spotify/flink-on-k8s-operator/pull/353 and if it is supposed to be only for jobs or also for TaskManager or JobManager. In current setup, there is only one PodDisruptionBudget per cluster which includes all pods: jobs, taskmanager, jobmanager, ..., because the selector labels are not specific enough. Or the logic behind how desired number of pods is calculated is faulty.

spec of existing PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2023-04-24T13:58:26Z"
  generation: 1
  labels:
    app: flink
    cluster: flink-cluster-new
  name: flink-flink-cluster-new
  namespace: gorr
  ownerReferences:
  - apiVersion: flinkoperator.k8s.io/v1beta1
    blockOwnerDeletion: false
    controller: true
    kind: FlinkCluster
    name: flink-cluster-new
    uid: 91f1c563-ca72-4539-aeaa-586d57942cd5
  resourceVersion: "2016776591"
  uid: aa825413-a554-4fa0-a154-67120afdc135
spec:
  maxUnavailable: 0%
  selector:
    matchLabels:
      app: flink
      cluster: flink-cluster-new
status:
  conditions:
  - lastTransitionTime: "2023-04-24T13:58:26Z"
    message: jobs.batch does not implement the scale subresource
    observedGeneration: 1
    reason: SyncFailed
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 0
  desiredHealthy: 4
  disruptionsAllowed: 0
  expectedPods: 4
  observedGeneration: 1

and running pods:

$ kgpol app=flink,cluster=flink-cluster-new
NAME                                        READY   STATUS    RESTARTS   AGE
flink-cluster-new-job-submitter-8nh27   1/1     Running   0          25m
flink-cluster-new-jobmanager-0          1/1     Running   0          25m
flink-cluster-new-taskmanager-0         1/1     Running   0          25m
flink-cluster-new-taskmanager-1         1/1     Running   0          25m
flink-cluster-new-taskmanager-2         1/1     Running   0          25m

This means that Pod can never be safely evicted to another node and just dies after the node is removed from the cluster or shutdown. I would prefer to have PdB per Pod type.

I have version 0.4.0, I will try the 0.5.0 if there are any changes around this.

regadas commented 1 year ago

Hi @balonik yeah with 0.5.0 you will be able to customize the PDB further. Also note that the PDB is opt-in now and it's not created by default.

That said, I think it would be great if we support a PDB per type (JobManager / TaskManager) instead of global one. PR around this is very welcomed if you are interested.