Open timflannagan opened 3 years ago
Note: the "Garbage collection for dependent resources when a bundle with configmap and secret objects is installed when the CSV is deleted OLM ..." test blocks are increasingly reproducible. When poking around the "should have removed the old configmap and put the new configmap in place" test, it appears there's some hotlooping in the catalog operator when attempting to process a Subscription that previously failed resolution, and contention attempting to always remove that status condition when firing off blind Update calls.
Misc: the need for an automatic rebasing mechanism for open PRs once a new PR has been merged from master.
Misc: the need for updating the test provisioner to also attempt to gather testing artifacts before deleting the cluster.
Misc: seeing quite a bit of connection-refused logs in the catalog-operator when firing off ListBundles calls:
E1006 17:13:09.466730 1 queueinformer_operator.go:290] sync "operators" failed: [error using catalog test-catalog (in namespace operators): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.96.133.46:50051: connect: connection refused", error using catalog operatorhubio-catalog (in namespace operator-lifecycle-manager): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.96.33.109:50051: connect: connection refused"]
https://github.com/operator-framework/operator-lifecycle-manager/issues/2420 - another quality of life issue when running e2e locally.
There's occasionally a panic in the TestConnectionEvents
series of unit tests where a 10 minute timeout occurs. This is seen in https://github.com/operator-framework/operator-lifecycle-manager/pull/2425/checks?check_run_id=3899261291
As of today (01/21/2022), I see the following e2e failures.
In addition to these, I see some failures that are caused by the installplan creation wait timeout. They have the following in the test log.
waiting for catalog pod scoped-catsrc-hzt42 to be available (for sync) - TRANSIENT_FAILURE
catalog scoped-catsrc-hzt42 pod with address scoped-catsrc-hzt42.scoped-ns-cfw9r.svc:50051
03:47:22.1316: (): nil
waiting for scoped-sub-wz8bw to have installplan ref
03:47:23.131: (): nil
waiting for scoped-sub-wz8bw to have installplan ref
03:47:24.1319: (): nil
waiting for scoped-sub-wz8bw to have installplan ref
03:47:25.1315: (): nil
waiting for scoped-sub-wz8bw to have installplan ref
.........
waiting for scoped-sub-wz8bw to have installplan ref
03:52:21.1343: never got correct status: v1alpha1.SubscriptionStatus{CurrentCSV:"", InstalledCSV:"", Install:(*v1alpha1.InstallPlanReference)(nil), State:"", Reason:"", InstallPlanGeneration:0, InstallPlanRef:(*v1.ObjectReference)(nil), CatalogHealth:
I'll open issues for them later.
CI Improvements
Controller Improvements
Flakes
make e2e-local E2E_SEED=1633621246
and focusing on theDescribe("Subscription")
top-level spec.Got source event: grpc.SourceState{Key:registry.CatalogKey{Name:\"test-catalog-fwgtr\", Namespace:\"operators\"}, State:2}
.test-catalog-fwgtr
CatalogSource locally, it's reporting a Ready.Status.LastObservedState
, and no InstallPlan resource was able to be generated.Misc/Needs Home/Triage/etc.