opendatahub-io / opendatahub-operator

Open Data Hub operator to manage ODH component integrations
https://opendatahub.io
Apache License 2.0
60 stars 139 forks source link

fix: added a filter to evicted pods when checking for ready status #1334

Open Al-Pragliola opened 3 days ago

Al-Pragliola commented 3 days ago

Description

I experienced a problem in a cluster, in namespace opendatahub-auth-provider there was a pod in Evicted state and another one Ready and working correctly. When DSCi tried to reconcile on CapabilityServiceMeshAuthorization it failed with the following error:

{"level":"info","ts":"2024-10-31T15:23:27Z","logger":"features","msg":"waiting for pods to become ready","feature":"mesh-control-plane-external-authz","namespace":"istio-system","duration (s)":300}
{"level":"info","ts":"2024-10-31T15:23:29Z","logger":"features","msg":"done waiting for pods to become ready","feature":"mesh-control-plane-external-authz","namespace":"istio-system"}
{"level":"info","ts":"2024-10-31T15:23:29Z","logger":"features","msg":"waiting for pods to become ready","feature":"enable-proxy-injection-in-authorino-deployment","namespace":"opendatahub-auth-provider","duration (s)":300}
{"level":"error","ts":"2024-10-31T15:28:29Z","msg":"failed applying service mesh resources","controller":"dscinitialization","controllerGroup":"dscinitialization.opendatahub.io","controllerKind":"DSCInitialization","DSCInitialization":{"name":"default-dsci"},"namespace":"","name":"default-dsci","reconcileID":"a003af8b-de11-4fe0-bd6a-b036070ac1be","error":"1 error occurred:\n\t* failed applying FeatureHandler features. cause: 1 error occurred:\n\t* 1 error occurred:\n\t* context deadline exceeded\n\n\n\n\n\n","stacktrace":"github.com/opendatahub-io/opendatahub-operator/v2/controllers/dscinitialization.(*DSCInitializationReconciler).configureServiceMesh\n\t/workspace/controllers/dscinitialization/servicemesh_setup.go:47\ngithub.com/opendatahub-io/opendatahub-operator/v2/controllers/dscinitialization.(*DSCInitializationReconciler).Reconcile\n\t/workspace/controllers/dscinitialization/dscinitialization_controller.go:285\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:227"}

Reason being that in the function WaitForPodsToBeReady we expect every pod to be ready after we apply patches to Authorino deployment, now with the pod in Evicted state if no one manually deletes it or the garbage collector procs, the status of the Capability will never get to be True even if Authorino is actually working.

Let me know what you think about it 🙏🏼

How Has This Been Tested?

make test

Screenshot or short clip

Merge criteria

openshift-ci[bot] commented 3 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign grdryn for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/opendatahub-io/opendatahub-operator/blob/incubation/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
codecov[bot] commented 3 days ago

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.

Please upload report for BASE (incubation@87c87ab). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pkg/feature/conditions.go 0.00% 12 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## incubation #1334 +/- ## ============================================= Coverage ? 19.02% ============================================= Files ? 30 Lines ? 3379 Branches ? 0 ============================================= Hits ? 643 Misses ? 2667 Partials ? 69 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.