opendatahub-io / model-registry-operator

Apache License 2.0
3 stars 19 forks source link

Following OCP Route feature, Controller crash #30

Closed tarilabs closed 11 months ago

tarilabs commented 11 months ago

Describe the bug Following https://github.com/opendatahub-io/model-registry-operator/issues/19 The MR Operator on OCP goes into CrashLoopBackOff.

To Reproduce Steps to reproduce the behavior:

  1. Using :latest (:main-21f99da) containing 21f99da02 results in:
W1118 19:07:33.394044       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Route: routes.route.openshift.io is forbidden: User "system:serviceaccount:model-registry-operator-system:model-registry-operator-controller-manager" cannot list resource "routes" in API group "route.openshift.io" at the cluster scope
E1118 19:07:33.394073       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Route: failed to list *v1.Route: routes.route.openshift.io is forbidden: User "system:serviceaccount:model-registry-operator-system:model-registry-operator-controller-manager" cannot list resource "routes" in API group "route.openshift.io" at the cluster scope
2023-11-18T19:08:13Z    ERROR   Could not wait for Cache to sync    {"controller": "modelregistry", "controllerGroup": "modelregistry.opendatahub.io", "controllerKind": "ModelRegistry", "error": "failed to wait for modelregistry caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ModelRegistry"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:203
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:208
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:234
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/manager/runnable_group.go:223
2023-11-18T19:08:13Z    INFO    Stopping and waiting for non leader election runnables
2023-11-18T19:08:13Z    INFO    Stopping and waiting for leader election runnables
2023-11-18T19:08:13Z    INFO    Stopping and waiting for caches
2023-11-18T19:08:13Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache   {"error": "Timeout: failed waiting for *v1.Route Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.28.3/pkg/util/wait/loop.go:49
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
    /go/pkg/mod/k8s.io/apimachinery@v0.28.3/pkg/util/wait/loop.go:50
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
    /go/pkg/mod/k8s.io/apimachinery@v0.28.3/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/source/kind.go:56
2023-11-18T19:08:13Z    INFO    Stopping and waiting for webhooks
2023-11-18T19:08:13Z    INFO    Stopping and waiting for HTTP servers
2023-11-18T19:08:13Z    INFO    shutting down server    {"kind": "health probe", "addr": "[::]:8081"}
2023-11-18T19:08:13Z    INFO    controller-runtime.metrics  Shutting down metrics server with timeout of 1 minute
2023-11-18T19:08:13Z    INFO    Wait completed, proceeding to shutdown the manager
2023-11-18T19:08:13Z    ERROR   setup   problem running manager {"error": "failed to wait for modelregistry caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ModelRegistry"}
main.main
    /workspace/cmd/main.go:151
runtime.main
    /usr/local/go/src/runtime/proc.go:267

With CrashLoopBackOff behaviour:

image

And so creating ModelRegistry CR does not sort any effect in the destination ODH Project namespace.

Expected behavior Using previous container image main-5c1b0db which did not contain the OCP Route feature, works as expected:

diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index 9154cef..663784b 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -70,7 +70,7 @@ spec:
         - /manager
         args:
         - --leader-elect
-        image: quay.io/opendatahub/model-registry-operator:latest
+        image: quay.io/opendatahub/model-registry-operator:main-5c1b0db
         name: manager
         env:
           - name: GRPC_IMAGE

Now creating ModelRegistry CR creates the MR deployment in the destination ODH Project namespace:

Screenshot 2023-11-18 at 17 09 47

Additional context I hope this is of service :) glad to provide more info as needed

tarilabs commented 11 months ago

Full log: model-registry-operator-controller-manager-9c78b7475-xh2nh-manager.log

dhirajsb commented 11 months ago

Saw this error earlier, forgot to push the PR to fix it.