loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.16k stars 372 forks source link

vcluster-eks: vcluster-api: "watch chan error: etcdserver: mvcc: required revision has been compacted" #1342

Open joaocc opened 8 months ago

joaocc commented 8 months ago

What happened?

Installed vcluster-eks 0.16.4 on EKS 1.27. Storage for etcd is on EFS. Messages start almost immediately after vcluster-api pod starts

I1101 10:33:31.743782       1 aggregator.go:164] waiting for initial CRD sync...
I1101 10:33:31.748914       1 gc_controller.go:78] Starting apiserver lease garbage collector
I1101 10:33:31.748967       1 handler_discovery.go:412] Starting ResourceDiscoveryManager
I1101 10:33:31.742914       1 controller.go:78] Starting OpenAPI AggregationController
I1101 10:33:31.750829       1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/run/config/pki/ca.crt"
I1101 10:33:31.751017       1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/run/config/pki/front-proxy-ca.crt"
E1101 10:33:31.843347       1 controller.go:95] Found stale data, removed previous endpoints on kubernetes service, apiserver didn't exit successfully previously
I1101 10:33:31.846125       1 shared_informer.go:318] Caches are synced for cluster_authentication_trust_controller
I1101 10:33:31.849180       1 apf_controller.go:377] Running API Priority and Fairness config worker
I1101 10:33:31.849368       1 apf_controller.go:380] Running API Priority and Fairness periodic rebalancing process
I1101 10:33:31.927892       1 shared_informer.go:318] Caches are synced for node_authorizer
I1101 10:33:31.932853       1 controller.go:624] quota admission added evaluator for: leases.coordination.k8s.io
I1101 10:33:31.939522       1 cache.go:39] Caches are synced for AvailableConditionController controller
I1101 10:33:31.940661       1 shared_informer.go:318] Caches are synced for crd-autoregister
I1101 10:33:31.940723       1 aggregator.go:166] initial CRD sync complete...
I1101 10:33:31.940737       1 autoregister_controller.go:141] Starting autoregister controller
I1101 10:33:31.940745       1 cache.go:32] Waiting for caches to sync for autoregister controller
I1101 10:33:31.940757       1 cache.go:39] Caches are synced for autoregister controller
I1101 10:33:31.943259       1 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1101 10:33:31.944056       1 shared_informer.go:318] Caches are synced for configmaps
I1101 10:33:32.748362       1 storage_scheduling.go:111] all system priority classes are created successfully or already exist.
W1101 10:33:35.109679       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:35.712071       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027911       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027960       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027983       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028004       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028029       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028051       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028070       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029210       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029246       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029823       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029858       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029865       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029880       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.030282       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.128476       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.128503       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted

What did you expect to happen?

No warning messages

How can we reproduce it (as minimally and precisely as possible)?

Not sure how to reproduce in minimal environment.

Anything else we need to know?

Install done via flux2 (HelmRelease) Potentially relevant links:

Host cluster Kubernetes version

```console $ kubectl version Client Version: v1.28.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.6-eks-f8587cb ```

Host cluster Kubernetes distribution

``` EKS 1.27 ```

vlcuster version

```console $ vcluster --version vcluster version 0.15.7 ```

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

``` eks ```

OS and Arch

``` OS: macOS Arch: arm64 ```
FabianKramm commented 8 months ago

@joaocc sorry for the delay, vcluster has problems with EFS as its causing issues with databases in general, do you have any chance to use EBS or something similar?

joaocc commented 8 months ago

Hi. Not really. We are using EFS as a way to simplify HA storage. Contrary to Azure, where ZRS allows mountable volumes that cross different AZs, it seems AWS EBS is restricted to a single AZ, so a vcluster that ends up being booted on another node would not be able to mount the EBS. On the other hand, we haven't noticed any kind of practical issues. Are you saying that EFS is not a supported storage for etcd? Thanks

joaocc commented 4 months ago

@FabianKramm following up on this thread...

For reference, we continue not to have any practical issues, except for a elevated EFS billing account (writes ~440MB/sec), which we are still trying to understand if it is from these writes or from something else.