kcp-dev / kcp

Kubernetes-like control planes for form-factors and use-cases beyond Kubernetes and container workloads.
https://kcp.io
Apache License 2.0
2.35k stars 381 forks source link

bug/flake: TestAPIExportAuthorizers - APIExport view missing queueing for related APIExports (via claims) #2713

Closed ncdc closed 1 year ago

ncdc commented 1 year ago

From https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/kcp-dev_kcp/2707/pull-ci-kcp-dev-kcp-main-e2e-sharded/1620426348143054848 for PR #2707

=== RUN   TestAPIExportAuthorizers
=== PAUSE TestAPIExportAuthorizers
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:69: shared kcp server will target configuration "/go/src/github.com/kcp-dev/kcp/.kcp/admin.kubeconfig"
=== CONT  TestAPIExportAuthorizers
    kcp.go:980: waiting for readiness for server at https://10.128.38.8:6444
=== CONT  TestAPIExportAuthorizers
    kcp.go:1023: success contacting https://10.128.38.8:6444/livez
=== CONT  TestAPIExportAuthorizers
    kcp.go:1023: success contacting https://10.128.38.8:6444/readyz
    kcp.go:1001: server at https://10.128.38.8:6444 is ready
=== CONT  TestAPIExportAuthorizers
    assertions.go:1691: Waiting for condition, but got: workspace phase is Initializing, not Ready
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:74: Created root:organization workspace root:e2e-workspace-ccdmr as /clusters/2j1clh7hrfj2r5l4 on shard "shard-1"
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:77: Created root:universal workspace root:e2e-workspace-ccdmr:service-provider-1 as /clusters/1huhzk2qxbm3y4si on shard "shard-1"
    authorizer_test.go:78: Created root:universal workspace root:e2e-workspace-ccdmr:service-provider-2 as /clusters/2hf0bx9k2m0adhb5 on shard "shard-1"
    authorizer_test.go:79: Created root:universal workspace root:e2e-workspace-ccdmr:tenant as /clusters/1nyczkrpz8rtgduv on shard "shard-1"
    authorizer_test.go:80: Created root:universal workspace root:e2e-workspace-ccdmr:tenant-shadowed-crd as /clusters/4x24bik73ykm94us on shard "shard-1"
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:93: Giving users [service-provider-1-admin service-provider-2-admin tenant-user] member access to workspace "root:e2e-workspace-ccdmr"
    authorizer_test.go:94: Giving users [service-provider-1-admin] member access to workspace "root:e2e-workspace-ccdmr:service-provider-1"
    authorizer_test.go:95: Giving users [service-provider-2-admin] member access to workspace "root:e2e-workspace-ccdmr:service-provider-2"
    authorizer_test.go:96: Giving users [tenant-user] member access to workspace "root:e2e-workspace-ccdmr:tenant"
    authorizer_test.go:97: Giving users [tenant-user] member access to workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd"
    authorizer_test.go:99: install sherriffs API resource schema, API export, permissions for tenant-user to be able to bind to the export in service provider workspace "root:e2e-workspace-ccdmr:service-provider-1"
    authorizer_test.go:100: applying "apis.kcp.io/v1alpha1, Kind=APIResourceSchema" workspace "root:e2e-workspace-ccdmr:service-provider-1" name "today.sheriffs.wild.wild.west"
    authorizer_test.go:100: applying "apis.kcp.io/v1alpha1, Kind=APIExport" workspace "root:e2e-workspace-ccdmr:service-provider-1" name "wild.wild.west"
    authorizer_test.go:100: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRole" workspace "root:e2e-workspace-ccdmr:service-provider-1" name "tenant-user-bind-apiexport"
    authorizer_test.go:100: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding" workspace "root:e2e-workspace-ccdmr:service-provider-1" name "tenant-user-bind-apiexport"
    authorizer_test.go:133: get the sheriffs apiexport's generated identity hash
    authorizer_test.go:148: Found identity hash: 09cc3909865b33171cc220d31d5d3e8c2681848f2aabe2b76e047b3e32704be7
    authorizer_test.go:150: install cowboys API resource schema, API export, and permissions for tenant-user to be able to bind to the export in second service provider workspace "root:e2e-workspace-ccdmr:service-provider-2"
    authorizer_test.go:151: applying "apis.kcp.io/v1alpha1, Kind=APIResourceSchema" workspace "root:e2e-workspace-ccdmr:service-provider-2" name "today.cowboys.wildwest.dev"
    authorizer_test.go:151: applying "apis.kcp.io/v1alpha1, Kind=APIExport" workspace "root:e2e-workspace-ccdmr:service-provider-2" name "today-cowboys"
    authorizer_test.go:151: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRole" workspace "root:e2e-workspace-ccdmr:service-provider-2" name "tenant-user-bind"
    authorizer_test.go:151: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding" workspace "root:e2e-workspace-ccdmr:service-provider-2" name "tenant-user-bind"
    authorizer_test.go:151: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRole" workspace "root:e2e-workspace-ccdmr:service-provider-2" name "tenant-user-maximum-permission-policy"
    authorizer_test.go:151: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding" workspace "root:e2e-workspace-ccdmr:service-provider-2" name "tenant-user-maximum-permission-policy"
    authorizer_test.go:207: bind cowboys and claimed sherriffs in the tenant workspace "root:e2e-workspace-ccdmr:tenant"
    authorizer_test.go:209: applying "apis.kcp.io/v1alpha1, Kind=APIBinding" workspace "root:e2e-workspace-ccdmr:tenant" name "wild.wild.west"
    authorizer_test.go:209: applying "apis.kcp.io/v1alpha1, Kind=APIBinding" workspace "root:e2e-workspace-ccdmr:tenant" name "cowboys"
    authorizer_test.go:257: Make sure ["wildwest.dev", "wild.wild.west"] API groups shows up in consumer workspace "root:e2e-workspace-ccdmr:tenant" group discovery
    authorizer_test.go:268: Install cowboys CRD and also bind the conflicting cowboys API export in tenant workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd"
    authorizer_test.go:269: applying "apiextensions.k8s.io/v1, Kind=CustomResourceDefinition" workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd" name "cowboys.wildwest.dev"
    authorizer_test.go:289: Waiting for cowboys CRD to be ready in tenant workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd"
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:303: Create a cowboys APIBinding in consumer workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd" that points to the today-cowboys export from "root:e2e-workspace-ccdmr:service-provider-2" but shadows a local cowboys CRD at the same time
    authorizer_test.go:304: applying "apis.kcp.io/v1alpha1, Kind=APIBinding" workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd" name "cowboys"
    authorizer_test.go:337: Waiting for cowboys APIBinding in consumer workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd" to have the condition "BindingUpToDate" mentioning the conflict with the shadowing local cowboys CRD
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:355: applying "wildwest.dev/v1alpha1, Kind=Cowboy" workspace "root:e2e-workspace-ccdmr:tenant" namespace "default" name "cowboy-via-api-binding"
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:363: applying "wildwest.dev/v1alpha1, Kind=Cowboy" workspace "root:e2e-workspace-ccdmr:tenant-shadowed-crd" namespace "default" name "cowboy-via-crd"
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:371: Create virtual workspace client for "today-cowboys" APIExport in workspace "root:e2e-workspace-ccdmr:service-provider-2"
    authorizer_test.go:388: verify that service-provider-2-admin cannot list sherrifs resources via virtual apiexport apiserver because we have no local maximal permissions yet granted
    authorizer_test.go:398: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRole" workspace "root:e2e-workspace-ccdmr:service-provider-1" name "service-provider-2-admin-maximum-permission-policy"
    authorizer_test.go:398: applying "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding" workspace "root:e2e-workspace-ccdmr:service-provider-1" name "service-provider-2-admin-maximum-permission-policy"
    authorizer_test.go:412: verify that service-provider-2-admin can lists all claimed resources using a wildcard request
=== CONT  TestAPIExportAuthorizers
    assertions.go:1691: Waiting for condition, but got: error while waiting to list "wild.wild.west/v1alpha1, Resource=sheriffs": the server could not find the requested resource
=== CONT  TestAPIExportAuthorizers
    authorizer_test.go:417: 
            Error Trace:    util.go:327
                                        authorizer_test.go:417
            Error:          Condition never satisfied
            Test:           TestAPIExportAuthorizers
            Messages:       listing claimed resources failed
--- FAIL: TestAPIExportAuthorizers (48.53s)
ncdc commented 1 year ago

I can reproduce this locally. I've added extra logging to pkg/virtual/apiexport/controllers/apireconciler/apiexport_apireconciler_reconcile.go to see what's going on. What I'm seeing is that https://github.com/kcp-dev/kcp/blob/61243123ebeee689dc2c71b4bb100fca7a3fbd5c/pkg/virtual/apiexport/controllers/apireconciler/apiexport_apireconciler_reconcile.go#L134

sometimes returns 0 exports, meaning kcp can't find the APIExport for sheriffs in this case, so it never adds it to discovery. Not sure why yet.

davidfestal commented 1 year ago

By any chance, is this flake the same as what @jmprusi seemed to have tried to fix in PR https://github.com/kcp-dev/kcp/pull/2417/files#diff-8d6f5573691c6741b1615098ed0f51a30afa2e1e9a906f19d5cbfceaef754930 ?

ncdc commented 1 year ago

@davidfestal yes, although that's not really a fix 😄

ncdc commented 1 year ago

More data from a local test failure (sharded, 1 shard):

ncdc commented 1 year ago

Note: the APIExport informer here is from the cache server

ncdc commented 1 year ago

Oh I think when the APIExport that has sheriffs gets updated, we need to queue all other APIExports that have claims against its (sheriffs) identity. Testing this out.

ncdc commented 1 year ago

Root cause:

lionelvillard commented 1 year ago

reopening as I ran into this flake today: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/kcp-dev_kcp/2423/pull-ci-kcp-dev-kcp-main-e2e-sharded/1620860539901054976

ncdc commented 1 year ago

Different root cause. Let's open a new one. /close

openshift-ci[bot] commented 1 year ago

@ncdc: Closing this issue.

In response to [this](https://github.com/kcp-dev/kcp/issues/2713#issuecomment-1412662828): >Different root cause. Let's open a new one. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ncdc commented 1 year ago

2732 for the new one