operator-framework / operator-controller

A new and improved management framework for extending Kubernetes with Operators
https://operator-framework.github.io/operator-controller/
Apache License 2.0
62 stars 53 forks source link

No status returned for non FBC catalog deployment #1378

Open rashmi43 opened 5 days ago

rashmi43 commented 5 days ago

No status returned for non FBC catalog deployment and CC is in progressing state

The catalogd-controller outputs errors on missing annotations required for a successful FBC catalog deployment.

This is message seen in the controller logs:

I1014 17:21:10.120579       1 containers_image.go:237] "no default policy found, using insecure policy" logger="catalogd-controller" controller="clustercatalog" controllerGroup="olm.operatorframework.io" controllerKind="ClusterCatalog" ClusterCatalog="deploy-operator-catalog" namespace="" name="deploy-operator-catalog" reconcileID="04baf908-4f8b-4c59-950f-8227cef5c9fb"
I1014 17:21:16.337809       1 containers_image.go:133] "pulled image" logger="catalogd-controller" controller="clustercatalog" controllerGroup="olm.operatorframework.io" controllerKind="ClusterCatalog" ClusterCatalog="deploy-operator-catalog" namespace="" name="deploy-operator-catalog" reconcileID="04baf908-4f8b-4c59-950f-8227cef5c9fb" ref="quay.io/xxx/deployer-operator-catalog:v0.0.2" digest="sha256:411004441dc63b479cb90209cd23295f773bd03913657b486b3578b503eac63b"
I1014 17:21:16.367376       1 clustercatalog_controller.go:107] "reconcile ending" logger="catalogd-controller" controller="clustercatalog" controllerGroup="olm.operatorframework.io" controllerKind="ClusterCatalog" ClusterCatalog="deploy-operator-catalog" namespace="" name="deploy-operator-catalog" reconcileID="04baf908-4f8b-4c59-950f-8227cef5c9fb"
E1014 17:21:16.367449       1 controller.go:316] "Reconciler error" err="source catalog content: error unpacking image: catalog image is missing the required label \"operators.operatorframework.io.index.configs.v1\"" controller="clustercatalog" controllerGroup="olm.operatorframework.io" controllerKind="ClusterCatalog" ClusterCatalog="deploy-operator-catalog" namespace="" name="deploy-operator-catalog" reconcileID="04baf908-4f8b-4c59-950f-8227cef5c9fb"

@anik120 fyi

rashmi43 commented 5 days ago

oc get clustercatalogs stays in the below state even after minutes even though the error is reported in the catalog-controller


NAME                                 LASTUNPACKED   SERVING   AGE
deploy-operator-catalog-nonfbc                            2m25s
operatorhubio                        12h            True      23h
joelanford commented 5 days ago

@rashmi43 are there more details via oc get clustercatalog deploy-operator-catalog-nonfbc -o yaml?

joelanford commented 5 days ago

Trying to figure out if this is a problem just in our printer column output, or if the actual status of the object is never updated to reflect the problem. I would expect Progressing condition to have a message that includes a similar error message.

rashmi43 commented 4 days ago

looks like just a status column problem:


oc get clustercatalogs  deploy-operator-catalog-nonfbc -o yaml
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterCatalog
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"olm.operatorframework.io/v1alpha1","kind":"ClusterCatalog","metadata":{"annotations":{},"name":"deploy-operator-catalog-nonfbc"},"spec":{"source":{"image":{"pollInterval":"24h","ref":"quay.io/rashmi.khanna/deployer-operator-catalog:v0.0.2"},"type":"Image"}}}
  creationTimestamp: "2024-10-15T18:24:39Z"
  finalizers:
  - olm.operatorframework.io/delete-server-cache
  generation: 1
  labels:
    olm.operatorframework.io/metadata.name: deploy-operator-catalog-nonfbc
  name: deploy-operator-catalog-nonfbc
  resourceVersion: "22572607"
  uid: 7e3ac989-88cf-4527-9b09-0d0f1720a7f9
spec:
  priority: 0
  source:
    image:
      pollInterval: 24h0m0s
      ref: quay.io/xxx/deployer-operator-catalog:v0.0.2
    type: Image
status:
  conditions:
  - lastTransitionTime: "2024-10-15T18:24:42Z"
    message: 'source catalog content: error unpacking image: catalog image is missing
      the required label "operators.operatorframework.io.index.configs.v1"'
    observedGeneration: 1
    reason: Retrying
    status: "True"
    type: Progressing
rashmi43 commented 4 days ago

After 10hours


NAME                                 LASTUNPACKED   SERVING   AGE
deploy-operator-catalog-nonfbc                            10h
anik120 commented 4 days ago

~Maybe we can supplant the area that's setting the condition Retrying to True with an additional action to set Serving to False. That should signal the user to check the yaml output of the CR.~

Looks like we're just missing an additional printer column for the Progressing condition here https://github.com/operator-framework/catalogd/blob/main/api/core/v1alpha1/clustercatalog_types.go#L49

trgeiger commented 4 days ago

My first thought was something like this:

//+kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[?(@.status=="True")].type`
//+kubebuilder:printcolumn:name="Reason",type=string,JSONPath=`.status.conditions[?(@.status=="True")].reason`

Which would show up like:

NAME            LASTUNPACKED   STATUS    REASON      AGE
operatorhubio   8m30s          Serving   Available   8m35s

But Serving and Progressing can both be True, such as when the catalog has already been unpacked and then the URL for the catalog changes or something. If Serving/Progressing were mutually exclusive then I think the above columns would be pretty good UX.

anik120 commented 4 days ago

Is it too many columns if we do

NAME            LASTUNPACKED   SERVING    REASON   PROGRESSING  REASON     AGE
operatorhubio   8m30s          True      Available    False    Succeeded  8m35s
trgeiger commented 4 days ago

I would think having 2 columns named "Reason" wouldn't be great, we'd need to differentiate them somehow which ends up becoming quite cumbersome if you do something like "Serving-Reason" or something. Are the reasons something we really need to show in the oc get output? Could just do Progressing and Serving columns, but my concern there is that the vast majority of the time Progressing is just going to say False.