Pod Affinity Feature Causing Flink Pipeline Redeployment to Fail

guruguha commented 1 year ago

We have Flink Pipelines running in our production environment using spotify operator v0.4.2 release.

We wanted to upgrade to the latest release v0.5.0 which has added features of pod affinity. When we did this in lower environments, we saw that on flink pipeline redeploy, we get this error about HorizontalPodAutoscaler. Below is the error we see on the Flink Operator logs:

{"level":"error","ts":"2023-04-18T17:51:04Z","logger":"controllers.FlinkCluster","msg":"Failed to observe the current state","controller":"flinkcluster","controllerGroup":"flinkoperator.k8s.io","controllerKind":"FlinkCluster","FlinkCluster":{"name":"dataprep-v1","namespace":"flink-dataprep"},"namespace":"flink-dataprep","name":"dataprep-v1","reconcileID":"bfffad73-7557-4d9f-bc97-320fd42cc598","error":"no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2\"","stacktrace":"github.com/spotify/flink-on-k8s-operator/controllers/flinkcluster.(*FlinkClusterHandler).reconcile\n\t/workspace/controllers/flinkcluster/flinkcluster_controller.go:153\ngithub.com/spotify/flink-on-k8s-operator/controllers/flinkcluster.(*FlinkClusterReconciler).Reconcile\n\t/workspace/controllers/flinkcluster/flinkcluster_controller.go:97\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235"}

We added the pod affinity to our cluster spec and started seeing this failure. We didn't have this in the previous operator version.

Looks like HorizontalPodAutoscaler autoscalingv2 expects EKS cluster version to be 1.22/23+. Can someone confirm this behavior? Our EKS cluster is on 1.21 which the release notes says is the min required version.

regadas commented 1 year ago

Hi @guruguha!

iirc the autoscaling/v2 is only available on 1.23 and indeed the README has the wrong prerequisites.

guruguha commented 1 year ago

@regadas Thanks for confirming. Thanks for creating the PR as well.

spotify / flink-on-k8s-operator

Pod Affinity Feature Causing Flink Pipeline Redeployment to Fail #674