Open VestigeJ opened 3 weeks ago
This appears to be related to watch requests and response pagination, I suspect the fixes from these recent kine PRs may need some additional tweaking to properly group requests in the way the apiserver expects:
@VestigeJ do you know if this can be reproduced on a single-node instance of k3s, or does it require 3+1 to fail?
I haven't tried it on a single host at all we don't usually try to do conformance tests without a 3-1 cluster.
<< Timeline [FAILED] Expected <string>: template-0002 to equal <string>: template-0001 In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/chunking.go:112 @ 05/07/24 00:14:39.521
It appears that the conformance tests now expect the datastore to list things sorted by key. Previously this wasn't required, and we closed out issues where people complained about it:
There are now tests that create a large number (400) of resources, and then list them in smaller (17) chunks, and test that the items come out sorted by name (the 7th item returned should be named template-0007, for example). Ref: https://github.com/kubernetes/kubernetes/blob/v1.29.4/test/e2e/apimachinery/chunking.go#L112-L113
This fails because kine does not sort returned items by key, as etcd does. They are instead sorted by revision, as this is the native order returned by the SQL query.
Timeline >> STEP: Creating a kubernetes client @ 05/06/24 23:55:51.153 May 6 23:55:51.153: INFO: >>> kubeConfig: /tmp/kubeconfig-4089597224 STEP: Building a namespace api object, basename chunking @ 05/06/24 23:55:51.154 STEP: Waiting for a default service account to be provisioned in namespace @ 05/06/24 23:55:51.17 STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 05/06/24 23:55:51.172 STEP: creating a large number of resources @ 05/06/24 23:55:51.174 STEP: retrieving the first page @ 05/06/24 23:56:08.864 May 6 23:56:08.935: INFO: Retrieved 40/40 results with rv 6307 and continue eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 STEP: retrieving the second page until the token expires @ 05/06/24 23:56:08.935 May 6 23:56:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:56:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:57:08.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:57:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:57:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:58:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:58:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:58:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:59:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:59:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 6 23:59:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:00:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:00:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:00:48.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:01:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:01:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:01:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:02:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:02:28.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:02:48.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:03:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:03:28.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet May 7 00:03:48.940: INFO: got error The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list. May 7 00:03:48.940: INFO: Retrieved inconsistent continue eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6LTEsInN0YXJ0IjoidGVtcGxhdGUtMDAzM1x1MDAwMCJ9 STEP: retrieving the second page again with the token received with the error message @ 05/07/24 00:03:48.94 [FAILED] in [It] - k8s.io/kubernetes/test/e2e/apimachinery/chunking.go:202 @ 05/07/24 00:03:48.946
This error seems to stem from a similar problem, related to paginating with expired continue tokens. However in this case the problem seems to be related to consistency with the remaining item count. This was supposed to have been fixed by https://github.com/k3s-io/kine/pull/271 but perhaps there are corner cases remaining with pagination and key order. Ref: https://github.com/kubernetes/kubernetes/blob/v1.29.4/test/e2e/apimachinery/chunking.go#L202
Timeline >> STEP: Creating a kubernetes client @ 05/06/24 23:54:45.579 May 6 23:54:45.579: INFO: >>> kubeConfig: /tmp/kubeconfig-4089597224 STEP: Building a namespace api object, basename watchlist @ 05/06/24 23:54:45.579 STEP: Waiting for a default service account to be provisioned in namespace @ 05/06/24 23:54:45.604 STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 05/06/24 23:54:45.606 STEP: Adding 5 secrets to watchlist-8617 namespace @ 05/06/24 23:54:45.608 STEP: Starting the secret informer @ 05/06/24 23:54:45.641 STEP: Waiting until the secret informer is fully synchronised @ 05/06/24 23:54:45.641 W0506 23:54:45.642543 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets) W0506 23:54:45.642578 22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call E0506 23:54:45.642590 22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call W0506 23:54:46.700065 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets) W0506 23:54:46.700096 22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call E0506 23:54:46.700108 22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call W0506 23:54:49.055521 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets) W0506 23:54:49.055553 22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call E0506 23:54:49.055564 22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call W0506 23:54:53.110684 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets) W0506 23:54:53.110713 22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call E0506 23:54:53.110726 22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call W0506 23:55:03.003099 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets) W0506 23:55:03.003129 22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call E0506 23:55:03.003142 22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call May 6 23:55:15.642: INFO: Unexpected error: Failed waiting for the secret informer in namespace to be synced: <context.deadlineExceededError>: context deadline exceeded {} [FAILED] in [It] - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:80 @ 05/06/24 23:55:15.642
W0506 23:54:45.642543 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)
An error is being bubbled up here, but I'm not sure if it's coming from the apiserver, or kine. I think the code in this test is small enough that we could just run it directly though. Ref: https://github.com/kubernetes/kubernetes/blob/v1.29.4/test/e2e/apimachinery/watchlist.go#L56
EDIT: This FeatureGate is still Alpha and is disabled by default, I'm not sure why this test is being run by default. The server is sending back:
The ListOptions "" is invalid: sendInitialEvents: Forbidden: sendInitialEvents is forbidden for watch unless the WatchList feature gate is enabled
With the FeatureGate enabled, the test passes:
------------------------------
[sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] [sig-api-machinery, Serial, Feature:WatchList]
k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:39
should be requested when ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set
k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:41
> Enter [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - set up framework | framework.go:200 @ 05/07/24 17:57:25.448
STEP: Creating a kubernetes client - k8s.io/kubernetes/test/e2e/framework/framework.go:220 @ 05/07/24 17:57:25.449
May 7 17:57:25.449: INFO: >>> kubeConfig: /home/brandond/.kube/k3s-server-1.yaml
STEP: Building a namespace api object, basename watchlist - k8s.io/kubernetes/test/e2e/framework/framework.go:259 @ 05/07/24 17:57:25.45
STEP: Waiting for a default service account to be provisioned in namespace - k8s.io/kubernetes/test/e2e/framework/framework.go:268 @ 05/07/24 17:57:25.459
STEP: Waiting for kube-root-ca.crt to be provisioned in namespace - k8s.io/kubernetes/test/e2e/framework/framework.go:271 @ 05/07/24 17:57:25.461
< Exit [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - set up framework | framework.go:200 @ 05/07/24 17:57:25.463 (15ms)
> Enter [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:33 @ 05/07/24 17:57:25.463
< Exit [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:33 @ 05/07/24 17:57:25.464 (0s)
> Enter [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:33 @ 05/07/24 17:57:25.464
< Exit [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:33 @ 05/07/24 17:57:25.464 (0s)
> Enter [It] should be requested when ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:41 @ 05/07/24 17:57:25.464
STEP: Adding 5 secrets to watchlist-6031 namespace - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:67 @ 05/07/24 17:57:25.464
STEP: Starting the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:73 @ 05/07/24 17:57:25.474
STEP: Waiting until the secret informer is fully synchronised - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:76 @ 05/07/24 17:57:25.474
STEP: Verifying if the secret informer was properly synchronised - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:82 @ 05/07/24 17:57:25.975
STEP: Listing secrets directly from the server from watchlist-6031 namespace - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:96 @ 05/07/24 17:57:25.975
STEP: Comparing secrets retrieved directly from the server with the ones that have been streamed to the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:102 @ 05/07/24 17:57:25.978
STEP: Modifying a secret and checking if the update was picked up by the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:85 @ 05/07/24 17:57:25.979
STEP: Listing secrets directly from the server from watchlist-6031 namespace - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:96 @ 05/07/24 17:57:25.984
STEP: Comparing secrets retrieved directly from the server with the ones that have been streamed to the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:102 @ 05/07/24 17:57:25.985
< Exit [It] should be requested when ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:41 @ 05/07/24 17:57:25.987 (523ms)
> Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:34 @ 05/07/24 17:57:25.987
May 7 17:57:25.987: INFO: Waiting up to 7m0s for all (but 0) nodes to be ready
< Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:34 @ 05/07/24 17:57:25.989 (2ms)
> Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:35 @ 05/07/24 17:57:25.989
< Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:35 @ 05/07/24 17:57:25.989 (0s)
> Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - dump namespaces | framework.go:218 @ 05/07/24 17:57:25.989
< Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - dump namespaces | framework.go:218 @ 05/07/24 17:57:25.989 (0s)
> Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - tear down framework | framework.go:215 @ 05/07/24 17:57:25.989
STEP: Destroying namespace "watchlist-6031" for this suite. - k8s.io/kubernetes/test/e2e/framework/framework.go:360 @ 05/07/24 17:57:25.989
< Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - tear down framework | framework.go:215 @ 05/07/24 17:57:25.992 (3ms)
• [0.544 seconds]
------------------------------
All of the failing tests should be addressed by the linked kine PR.
Note: any tests with Feature:x
in the name require the named featuregate to be enabled on the apiserver; most of these are off by default and the test failures are expected in this case.
Environmental Info: K3s Version:
Any current branch
Node(s) CPU architecture, OS, and Version:
not applicable the same bug is present regardless of the OS.
Cluster Configuration:
3 servers 1 agent 1 external MySQL DB mysql Ver 8.0.36 for Linux on x86_64 (MySQL Community Server - GPL)
Describe the bug:
Failing conformance checks only when MySQL is the external DB - the exact same setup with postgres passes conformance checks just fine.
Steps To Reproduce:
Expected behavior:
all conformance checks pass
Actual behavior:
specifically it seems that various cascading failures can occur specifically with the sig-api-machinery suite of tests.
Additional context / logs:
These commands in the test suite when run manually pass
Important note that MySQL itself does not seem bottlenecked or overburdened by the cluster in this setup