kcp-dev / kcp

Kubernetes-like control planes for form-factors and use-cases beyond Kubernetes and container workloads.
https://kcp.io
Apache License 2.0
2.35k stars 381 forks source link

bug? flake? TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled #2603

Closed ncdc closed 3 months ago

ncdc commented 1 year ago

From https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/kcp-dev_kcp/2602/pull-ci-kcp-dev-kcp-main-e2e-shared/1613279921029779456

After all the shards are deleted, we create a workspace, which should be unschedulable, but somehow it gets scheduled and then initialized?

=== RUN   TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
=== PAUSE TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
=== CONT  TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
    controller_test.go:170: Saving test artifacts for test "TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled" under "/logs/artifacts/TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled/1636451888".
=== CONT  TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
    controller_test.go:170: Starting kcp servers...
    kcp.go:430: running: /go/src/github.com/kcp-dev/kcp/bin/kcp start --root-directory /tmp/TestWorkspaceControlleradd_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled3772238024/001/kcp/main --secure-port=37495 --embedded-etcd-client-port=43281 --embedded-etcd-peer-port=39257 --embedded-etcd-wal-size-bytes=5000 --kubeconfig-path=/tmp/TestWorkspaceControlleradd_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled3772238024/001/kcp/main/admin.kubeconfig --feature-gates=CustomResourceValidationExpressions=true --audit-log-path /logs/artifacts/TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled/1636451888/artifacts/kcp/main/kcp.audit
=== CONT  TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
    kcp.go:728: success contacting https://10.128.134.92:37495/readyz
    kcp.go:728: success contacting https://10.128.134.92:37495/livez
    controller_test.go:170: Started kcp servers after 19.607612204s
=== CONT  TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
    controller_test.go:175: Created root:organization workspace root:e2e-workspace-n7kxg as /clusters/n6xcdh5ewikvarda
I0111 21:20:40.250327   30383 shared_informer.go:255] Waiting for caches to sync for TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
I0111 21:20:40.250487   30383 reflector.go:219] Starting reflector *v1beta1.Workspace (0s) from k8s.io/client-go@v0.0.0-20230109113100-c493866a854f/tools/cache/reflector.go:167
I0111 21:20:40.250515   30383 reflector.go:255] Listing and watching *v1beta1.Workspace from k8s.io/client-go@v0.0.0-20230109113100-c493866a854f/tools/cache/reflector.go:167
I0111 21:20:40.361509   30383 shared_informer.go:285] caches populated
I0111 21:20:40.361562   30383 shared_informer.go:262] Caches are synced for TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
I0111 21:20:40.361752   30383 shared_informer.go:255] Waiting for caches to sync for TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
I0111 21:20:40.361920   30383 reflector.go:219] Starting reflector *v1alpha1.Shard (0s) from k8s.io/client-go@v0.0.0-20230109113100-c493866a854f/tools/cache/reflector.go:167
I0111 21:20:40.361951   30383 reflector.go:255] Listing and watching *v1alpha1.Shard from k8s.io/client-go@v0.0.0-20230109113100-c493866a854f/tools/cache/reflector.go:167
I0111 21:20:40.462862   30383 shared_informer.go:285] caches populated
I0111 21:20:40.462930   30383 shared_informer.go:262] Caches are synced for TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
    controller_test.go:190: Get a list of current shards so that we can schedule onto a valid shard later
    controller_test.go:190: Delete all pre-configured shards, we have to control the creation of the workspace shards in this test
    controller_test.go:190: Create a workspace without shards
    controller_test.go:190: Expect workspace to be unschedulable
=== CONT  TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
    controller_test.go:190: 
            Error Trace:    controller_test.go:116
                                        controller_test.go:190
            Error:          Received unexpected error:
                            expected state not found: context deadline exceeded, 11 errors encountered while processing 11 events: [Workspace.tenancy.kcp.io "steve" not found, expected an unschedulable workspace, got status.conditions: v1alpha1.Conditions(nil), expected an unschedulable workspace, got status.conditions: v1alpha1.Conditions{v1alpha1.Condition{Type:"WorkspaceScheduled", Status:"True", Severity:"", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 40, 0, time.Local), Reason:"", Message:""}}, expected an unschedulable workspace, got status.conditions: v1alpha1.Conditions{v1alpha1.Condition{Type:"WorkspaceInitialized", Status:"False", Severity:"Info", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 40, 0, time.Local), Reason:"InitializerExists", Message:"Initializers still exist: [root:universal system:apibindings]"}, v1alpha1.Condition{Type:"WorkspaceScheduled", Status:"True", Severity:"", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 40, 0, time.Local), Reason:"", Message:""}}, expected an unschedulable workspace, got status.conditions: v1alpha1.Conditions{v1alpha1.Condition{Type:"WorkspaceInitialized", Status:"False", Severity:"Info", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 40, 0, time.Local), Reason:"InitializerExists", Message:"Initializers still exist: [system:apibindings]"}, v1alpha1.Condition{Type:"WorkspaceScheduled", Status:"True", Severity:"", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 40, 0, time.Local), Reason:"", Message:""}}, expected an unschedulable workspace, got status.conditions: v1alpha1.Conditions{v1alpha1.Condition{Type:"WorkspaceInitialized", Status:"True", Severity:"", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 41, 0, time.Local), Reason:"", Message:""}, v1alpha1.Condition{Type:"WorkspaceScheduled", Status:"True", Severity:"", LastTransitionTime:time.Date(2023, time.January, 11, 21, 20, 40, 0, time.Local), Reason:"", Message:""}}]
            Test:           TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled
            Messages:       did not see workspace marked unschedulable
    kcp.go:417: cleanup: canceling context
    kcp.go:421: cleanup: waiting for shutdownComplete
I0111 21:21:10.600893   30383 reflector.go:536] k8s.io/client-go@v0.0.0-20230109113100-c493866a854f/tools/cache/reflector.go:167: Watch close - *v1beta1.Workspace total 10 items received
I0111 21:21:10.600938   30383 reflector.go:536] k8s.io/client-go@v0.0.0-20230109113100-c493866a854f/tools/cache/reflector.go:167: Watch close - *v1alpha1.Shard total 1 items received
    kcp.go:425: cleanup: received shutdownComplete
    --- FAIL: TestWorkspaceController/add_a_shard_after_a_workspace_is_unschedulable,_expect_it_to_be_scheduled (50.95s)
nrb commented 1 year ago

This appears to be a timing issue where the shard is deleted from etcd, but is still in the informer cache. The workspace controller sees the shard as valid, and marks the workspace as scheduled.

@sttts Does that sound like a valid explanation of why this edge case might happen?

kcp-ci-bot commented 5 months ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kcp-ci-bot commented 4 months ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kcp-ci-bot commented 3 months ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kcp-ci-bot commented 3 months ago

@kcp-ci-bot: Closing this issue.

In response to [this](https://github.com/kcp-dev/kcp/issues/2603#issuecomment-2195621129): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.