operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.67k stars 540 forks source link

[FLAKE e2e (2)] Operator Group [It] insufficient permissions resolve via RBAC #3091

Open tmshort opened 8 months ago

tmshort commented 8 months ago

Flaky Test Report

Also; too many similar/identical logs.

Failure Log Link Failure Log

Relevant Failure Log

------------------------------
• [FAILED] [302.403 seconds]
Operator Group [It] insufficient permissions resolve via RBAC
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1612

  Timeline >>
  created the operator-group-e2e-m679s testing namespace
  14:59:21.6505: Creating CRD
  14:59:21.6708: Creating operator group
  14:59:21.7011: Creating CSV
  14:59:21.8668: wait for CSV to fail
  14:59:21.9798:  (): 
  waited 9.38511127s for CSV operator-group-e2e-m679s-zr5ms/another-csv-jwr88: to be in phases [Failed], in phase  (): 
  14:59:22.0721:  (): 
  14:59:22.1724:  (): 
  14:59:22.282:  (): 
  14:59:22.3772:  (): 
  14:59:22.473:  (): 
  14:59:22.5726:  (): 
  14:59:22.7073:  (): 
  14:59:22.9107:  (): 
  14:59:23.1086: InstallReady (AllRequirementsMet): all requirements found, attempting install
  waited 1.128787182s for CSV operator-group-e2e-m679s-zr5ms/another-csv-jwr88: to be in phases [Failed], in phase InstallReady (AllRequirementsMet): all requirements found, attempting install
  14:59:23.3095: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  waited 200.854629ms for CSV operator-group-e2e-m679s-zr5ms/another-csv-jwr88: to be in phases [Failed], in phase Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  14:59:23.3211: wait for CSV to succeeed
  14:59:23.5102: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  waited 5.402819873s for CSV operator-group-e2e-m679s-zr5ms/another-csv-jwr88: to be in phases [Succeeded], in phase Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  14:59:23.7102: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  14:59:23.9075: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  14:59:24.1069: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"

... (lots of repeated lines)

  15:04:23.3076: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  15:04:23.5087: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  15:04:23.708: Failed (InstallComponentFailedNoRetry): install strategy failed: install strategy failed: deployments.apps "operator-deploymentzkkxr" is forbidden: User "system:serviceaccount:operator-group-e2e-m679s-zr5ms:nginx-sansp5b" cannot get resource "deployments" in API group "apps" in the namespace "operator-group-e2e-m679s-zr5ms"
  [FAILED] in [It] - /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1746 @ 11/06/23 15:04:23.858
  cleaning up ephemeral test resources...
  deleting test subscriptions...
  deleting test installplans...
  deleting test catalogsources...
  deleting test crds...
  deleting test csvs...
  test resources deleted
  << Timeline

  [FAILED] 
    Error Trace:    /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1746
                                /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/onsi/ginkgo/v2/internal/node.go:463
                                /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/onsi/ginkgo/v2/internal/suite.go:863
                                /opt/hostedtoolcache/go/1.20.10/x64/src/runtime/asm_amd64.s:1598
    Error:          Received unexpected error:
                    timed out waiting for the condition
    Test:           Operator Group insufficient permissions resolve via RBAC

  In [It] at: /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1746 @ 11/06/23 15:04:23.858

  Full Stack Trace
    github.com/stretchr/testify/require.NoError({0x7fa3ac8bddb0, 0xc000cf5440}, {0x40901c0, 0xc0001ba160}, {0x0, 0x0, 0x0})
        /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/stretchr/testify/require/require.go:1357 +0x96
    github.com/operator-framework/operator-lifecycle-manager/test/e2e.glob..func20.9()
        /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1746 +0x15b5
------------------------------
tmshort commented 8 months ago

Investigating, not sure this is a testing issue, the test is fairly simple and seems OK.

In the success case, the CSV gets updated due to a change in the service account (operatorgroup.go):

Requeuing CSV due to detected service account change

not due to a change in RBAC (operator.go). The failure case (about 25% of the time, or so), does not contain the above log message.

tmshort commented 8 months ago

The occurrence rate is about 25% or less. EDIT: Success 10/10 times, so less than 10% occurrence.