poddisruptionbudget is not allowing any disruptions

I don't know the intentions of https://github.com/spotify/flink-on-k8s-operator/pull/353 and if it is supposed to be only for jobs or also for TaskManager or JobManager. In current setup, there is only one PodDisruptionBudget per cluster which includes all pods: jobs, taskmanager, jobmanager, ..., because the selector labels are not specific enough. Or the logic behind how desired number of pods is calculated is faulty.

spec of existing PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2023-04-24T13:58:26Z"
  generation: 1
  labels:
    app: flink
    cluster: flink-cluster-new
  name: flink-flink-cluster-new
  namespace: gorr
  ownerReferences:
  - apiVersion: flinkoperator.k8s.io/v1beta1
    blockOwnerDeletion: false
    controller: true
    kind: FlinkCluster
    name: flink-cluster-new
    uid: 91f1c563-ca72-4539-aeaa-586d57942cd5
  resourceVersion: "2016776591"
  uid: aa825413-a554-4fa0-a154-67120afdc135
spec:
  maxUnavailable: 0%
  selector:
    matchLabels:
      app: flink
      cluster: flink-cluster-new
status:
  conditions:
  - lastTransitionTime: "2023-04-24T13:58:26Z"
    message: jobs.batch does not implement the scale subresource
    observedGeneration: 1
    reason: SyncFailed
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 0
  desiredHealthy: 4
  disruptionsAllowed: 0
  expectedPods: 4
  observedGeneration: 1

and running pods:

$ kgpol app=flink,cluster=flink-cluster-new
NAME                                        READY   STATUS    RESTARTS   AGE
flink-cluster-new-job-submitter-8nh27   1/1     Running   0          25m
flink-cluster-new-jobmanager-0          1/1     Running   0          25m
flink-cluster-new-taskmanager-0         1/1     Running   0          25m
flink-cluster-new-taskmanager-1         1/1     Running   0          25m
flink-cluster-new-taskmanager-2         1/1     Running   0          25m

This means that Pod can never be safely evicted to another node and just dies after the node is removed from the cluster or shutdown. I would prefer to have PdB per Pod type.

I have version 0.4.0, I will try the 0.5.0 if there are any changes around this.

spotify / flink-on-k8s-operator

poddisruptionbudget is not allowing any disruptions #675