operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.72k stars 545 forks source link

none of the deployment works with okd 3.11 #715

Closed gacopl closed 5 years ago

gacopl commented 5 years ago

latest deployment for okd makes packageservice restarts 0.7.4 deployment make operator crashloopbackoff

catalogs don't work, and issuing CSV alone to openshift-operators fails, in cluster console failed staus and none deployments are created

any ideas how to install OLM with okd 3.11?

njhale commented 5 years ago

@gacopl could you try using the manifests from the 0.8.1 release and remove all container arguments from 0000_50_olm_06-olm-operator.deployment.yaml before you apply them to the cluster?

ron1 commented 5 years ago

I would have expected CI to catch this type of bug. Is that not the case?

njhale commented 5 years ago

Our CI is geared towards OpenShift 4.0 and we do not re-test older releases.

On Sat, Feb 16, 2019 at 1:02 PM ron1 notifications@github.com wrote:

I would have expected CI to catch this type of bug. Is that not the case?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/operator-framework/operator-lifecycle-manager/issues/715#issuecomment-464367915, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-4LpZUmKEQHeaY_MwVOmcAp0rSqDgyks5vOEe8gaJpZM4a-sc9 .

-- Nick Hale

gacopl commented 5 years ago

@njhale ok that seemed to work at least olm is not dying now i need to figure out what's with requirementsnotmet when installing CSV

njhale commented 5 years ago

@gacopl The requirement status section of the CSV's status should tell you exactly what's missing on the cluster for the CSV to run. You can either create the missing resources manually, or if you have an OLM CatalogSource that contains your CSV, you can create a Susbcription which will attempt to resolve and apply them for you.

All resource generation besides APIService and Deployment are now handled by the catalog-operator and requires a Susbcription.

gacopl commented 5 years ago

Thanks @njhale i'm trying to wrap my head around catalogs, after installatio only pacageserver catalog is present i want to try out community operators, specificaly couchbase from certified operators, how can i add catalogs? I see the packageserver is served through grpc from some pod. In Enterprise OCP 3.11 there were special configmaps but that is more than half year old

gacopl commented 5 years ago

I understand that CSV is parsed and the installplan is created but this happens only when you subscribe from something from catalog, how can i add more stuff or new catalog to test out patches i made for CSV

ron1 commented 5 years ago

Our CI is geared towards OpenShift 4.0 and we do not re-test older releases. On Sat, Feb 16, 2019 at 1:02 PM ron1 @.***> wrote: I would have expected CI to catch this type of bug. Is that not the case? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#715 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-4LpZUmKEQHeaY_MwVOmcAp0rSqDgyks5vOEe8gaJpZM4a-sc9 . -- Nick Hale

@njhale Thanks for the feedback. Given that OLM is in Tech Preview for OCP 3.11, is there intention to keep the latest version of OLM working on OCP 3.11 or are all efforts focused exclusively on deployments to OKD/OCP 4.0 pre-releases?

njhale commented 5 years ago

@ron1 We are really just restricted in whether we depend on any backwards incompatible kubernetes changes. So we can't guarantee the latest OLM image and manifests will work with OpenShift installations based on older kubernetes versions. Our previous release manifests are tied to specific OLM image digests, so if a version of these manifests works on 3.11 it should continue to work, unless the manifests were changed somewhere along the way (need to double check this hasn't already happened).

njhale commented 5 years ago

Thanks @njhale i'm trying to wrap my head around catalogs, after installatio only pacageserver catalog is present i want to try out community operators, specificaly couchbase from certified operators, how can i add catalogs? I see the packageserver is served through grpc from some pod. In Enterprise OCP 3.11 there were special configmaps but that is more than half year old

@gacopl If you just want to try out community-operators with a newer version of OLM 0.8.1 you can generate the following CatalogSource in the namespace you have OLM running in:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operators
spec:
  displayName: Community Operators
  image: quay.io/njhale/community-operators@sha256:37f1dd6ab4f1082af9d8f9ef028a2be4fb2837c5a75ba59bd127ebc723bfee8d
  publisher: community-operators
  sourceType: grpc

When you create a new subscription to the operator you want, be sure to specify the correct sourceNamespace field.

ron1 commented 5 years ago

@njhale Does it make sense that 0000_50_olm_14-operatorstatus.yaml fails to apply on OCP 3.11? I see that this file exists for OCP but not OKD. Am I correct this file exclusively targets OCP 4.0?

Also would you mind describing the process you used to create the community-operators image referenced above? Am I correct you used a variation of https://github.com/operator-framework/operator-registry/blob/master/upstream.Dockerfile with some additional steps?

Finally, when I created the community-operators CatalogSource you provided above, all my packageserver pods immediately started panicking with the following stack trace. Any thoughts?

$ oc logs packageserver-8df5d696c-kphcx
time="2019-02-22T18:21:44Z" level=info msg="Using in-cluster kube client config"
time="2019-02-22T18:21:44Z" level=info msg="package-server configured to watch namespaces []"
time="2019-02-22T18:21:44Z" level=info msg="Using in-cluster kube client config"
time="2019-02-22T18:21:44Z" level=info msg="connection established. cluster-version: v1.11.0+d4cacc0"
time="2019-02-22T18:21:44Z" level=info msg="operator ready"
time="2019-02-22T18:21:44Z" level=info msg="starting informers..."
time="2019-02-22T18:21:44Z" level=info msg="waiting for caches to sync..."
I0222 18:21:44.273402       1 reflector.go:202] Starting reflector *v1alpha1.CatalogSource (5m0s) from github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:112
I0222 18:21:44.273439       1 reflector.go:240] Listing and watching *v1alpha1.CatalogSource from github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:112
E0222 18:21:44.278692       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:388
/go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/api/apis/operators/v1alpha1/catalogsource_types.go:46
/go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider/registry.go:166
/go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider/registry.go:85
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/controller.go:195
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390
/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13a93e8]

goroutine 27 [running]:
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x161a300, 0x26270a0)
    /usr/local/go/src/runtime/panic.go:502 +0x229
github.com/operator-framework/operator-lifecycle-manager/pkg/api/apis/operators/v1alpha1.(*RegistryServiceStatus).Address(0x0, 0xc42024c5a0, 0x1853ec6)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/api/apis/operators/v1alpha1/catalogsource_types.go:46 +0x38
github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider.(*RegistryProvider).catalogSourceAdded(0xc420233260, 0x181d4a0, 0xc420376e00)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider/registry.go:166 +0x2a1
github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider.(*RegistryProvider).(github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider.catalogSourceAdded)-fm(0x181d4a0, 0xc420376e00)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/provider/registry.go:85 +0x3e
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(0xc420252600, 0xc420252610, 0xc420252620, 0x181d4a0, 0xc420376e00)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/controller.go:195 +0x49
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0x0, 0x0, 0x0)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554 +0x21a
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc420715df0, 0x429b19, 0xc4204f60d0)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0x9c
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x81
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc4205e6f68)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420715f68, 0xdf8475800, 0x0, 0x15e1801, 0xc4200be240)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc4205e6f68, 0xdf8475800, 0xc4200be240)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc4207f6980)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x78
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache.(*processorListener).(github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache.run)-fm()
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390 +0x2a
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc4201647d0, 0xc420504100)
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
    /go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
njhale commented 5 years ago

@ron1 your first point is correct. 0000_50_olm_14-operatorstatus.yaml is a CustomResource for reporting second level operator (SLO) status to OCP 4.0s cluster-version-operator (no relation to OLM's CSVs), which is not present in any version < 4.0, OKD, or upstream.

You are also correct in that https://github.com/operator-framework/operator-registry/blob/master/upstream.Dockerfile is the "basis" for how I packaged community-operators as an OLM catalog - it provides an example of how to build a an OLM operator-registry image, which is OLM's preferred way to package operator catalog content. We have a PR in-flight that should be merging soon to update the docs in that repo to better reflect this.

As for the 3rd issue - it seems like the version of OLM being used is older than what's currently in master. From the provided panic log, registry.go:166 is supposed to be accessing a nil pointer, but in master this seems to be a log call. We also have a check early in this function to bail out early if the RegistryStatus is nil.

njhale commented 5 years ago

@gacopl I just tested the 0.7.4 OKD manifests against a kube 1.11 (which is the version used in OKD 3.11) cluster and everything worked fine. I even generated a subscription to etcd, which resolved correctly and installed etcd:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd
  namespace: openshift-operator-lifecycle-manager
spec:
  source: rh-operators
  sourceNamespace: openshift-operator-lifecycle-manager
  name: etcd
  channel: alpha

How did you install OKD? At this point, I'm reasonably sure OLM's manifests and images are good for 0.7.4.

gacopl commented 5 years ago

basic oc cluster up of 3.11, the 0,8x branch worked for me after doing the args fix

ron1 commented 5 years ago

@njhale Does OLM 0.7.4 support the latest CSV schemas used by operators currently in the community-operators repository? Also, does it support the operator-registry-based CatalogSources with sourceType grpc you described in your prior comment?

njhale commented 5 years ago

@ron1

  1. CSVs are somewhat forward compatible - newer fields like InstallModes won't be respected.
  2. 0.7.4 does not support operator-registry based CatalogSources
ron1 commented 5 years ago

@njhale Given that OLM has changed significantly between 0.7.4 and 0.8.1+ including grpc CatalogSources, OperatorGroups, InstallModes, etc., and given that Operators currently in the community-operators repo are likely being tested only against OLM 0.8.1+, would you expect OLM 0.7.4 to reliably manage the current set of Operators in the community-operators repo? If so, what is the best way to assemble/deploy community-operators/upstream-community-operators into a ConfigMap-based CatalogSource for use by OLM 0.7.4?

ecordell commented 5 years ago

Current status for 3.11:

If you need to play with OLM and are okay with those caveats on 3.11, you might try the upstream installation instructions: https://github.com/operator-framework/operator-lifecycle-manager/releases/tag/0.10.0