k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.08k stars 2.35k forks source link

K3s keeps crashing every few minutes due to leader election failure #2615

Closed cwoolum closed 3 years ago

cwoolum commented 3 years ago

Environmental Info: K3s Version:

k3s version v1.18.9+k3s1 (630bebf9) k3os version v0.11.1 <-- this occurred in 10.0 as well

Node(s) CPU architecture, OS, and Version:

Linux k3s-1 4.19.97-v8+ #1294 SMP PREEMPT Thu Jan 30 13:27:08 GMT 2020 aarch64 GNU/Linux

2 - Raspberry PI 4's running on Class 10 UHS-1 SD Cards

Cluster Configuration:

1 master, 1 agent

Describe the bug:

After a new install of k3os/k3s, etcd keeps crashing every few minutes.

Steps To Reproduce:

Expected behavior:

I would expect kubernetes to run stable.

Actual behavior:

Kubernetes crashes every couple of minutes after. Pods that rely on leader election also seem to crash.

Additional context / logs:

Logs ``` Trace[148579825]: [661.953235ms] [661.523057ms] Object stored in database I1202 20:45:01.040074 8976 trace.go:116] Trace[1934252170]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-12-02 20:44:59.468270221 +0000 UTC m=+81.601702519) (total time: 1.571678379s): Trace[1934252170]: [1.571534493s] [1.569092018s] Transaction committed I1202 20:45:01.040962 8976 trace.go:116] Trace[158077054]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf/leader-election,client:127.0.0.1 (started: 2020-12-02 20:44:59.46773849 +0000 UTC m=+81.601170788) (total time: 1.573054522s): Trace[158077054]: [1.572577086s] [1.572250481s] Object stored in database I1202 20:45:04.703977 8976 trace.go:116] Trace[1309380047]: "List etcd3" key:/events,resourceVersion:,limit:1000,continue: (started: 2020-12-02 20:45:03.793147584 +0000 UTC m=+85.926579882) (total time: 910.53336ms): Trace[1309380047]: [910.53336ms] [910.53336ms] END I1202 20:45:04.968218 8976 trace.go:116] Trace[560134109]: "List" url:/api/v1/events,user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Lens/3.6.7 Chrome/83.0.4103.122 Electron/9.1.2 Safari/537.36,client:127.0.0.1 (started: 2020-12-02 20:45:03.792894978 +0000 UTC m=+85.926327275) (total time: 1.174766967s): Trace[560134109]: [912.909671ms] [912.674175ms] Listing from storage done Trace[560134109]: [1.174751301s] [261.84163ms] Writing http response done count:517 E1202 20:45:10.124876 8976 controller.go:178] failed to update node lease, error: Put https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k3s-1?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers) time="2020-12-02T20:45:10.130410420Z" level=error msg="error in txn: context canceled" I1202 20:45:10.125073 8976 trace.go:116] Trace[1690869822]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-12-02 20:45:00.129753967 +0000 UTC m=+82.263186265) (total time: 9.995187714s): Trace[1690869822]: [9.995187714s] [9.992894569s] END I1202 20:45:10.131045 8976 trace.go:116] Trace[2116949679]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k3s-1,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf,client:127.0.0.1 (started: 2020-12-02 20:45:00.129205625 +0000 UTC m=+82.262637923) (total time: 10.001699452s): Trace[2116949679]: [10.001699452s] [10.001346384s] END I1202 20:45:11.194877 8976 trace.go:116] Trace[1638256539]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-02 20:45:01.181796158 +0000 UTC m=+83.315228456) (total time: 10.01293533s): Trace[1638256539]: [9.162288138s] [9.160022085s] Transaction committed Trace[1638256539]: [10.01293533s] [846.865017ms] END E1202 20:45:13.063493 8976 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"} I1202 20:45:13.065564 8976 trace.go:116] Trace[1118593115]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local-path-provisioner/v0.0.0 (linux/arm64) kubernetes/$Format,client:10.42.0.5 (started: 2020-12-02 20:45:01.181165632 +0000 UTC m=+83.314597930) (total time: 11.884231395s): Trace[1118593115]: [11.884231395s] [11.883804717s] END E1202 20:45:11.696999 8976 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager: context deadline exceeded I1202 20:45:11.697467 8976 trace.go:116] Trace[463620426]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-02 20:45:01.74087159 +0000 UTC m=+83.874303888) (total time: 9.956464373s): Trace[463620426]: [9.039221883s] [9.036212381s] Transaction committed Trace[463620426]: [9.956464373s] [915.207378ms] END E1202 20:45:11.953410 8976 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: context deadline exceeded I1202 20:45:11.954804 8976 trace.go:116] Trace[22746264]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-02 20:45:01.99803945 +0000 UTC m=+84.131471747) (total time: 9.956183886s): Trace[22746264]: [9.003243307s] [9.00081496s] Transaction committed Trace[22746264]: [9.956183886s] [950.672304ms] END I1202 20:45:12.000414 8976 trace.go:116] Trace[683943658]: "Update" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf/leader-election,client:127.0.0.1 (started: 2020-12-02 20:45:01.997261574 +0000 UTC m=+84.130693872) (total time: 10.002829094s): Trace[683943658]: [10.002829094s] [10.002248863s] END E1202 20:45:13.067344 8976 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: context deadline exceeded I1202 20:45:13.067959 8976 trace.go:116] Trace[924601370]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-02 20:45:03.431252866 +0000 UTC m=+85.564685256) (total time: 9.636551546s): Trace[924601370]: [7.795962966s] [7.783669052s] Transaction committed Trace[924601370]: [9.636551546s] [1.838588614s] END I1202 20:45:13.078654 8976 leaderelection.go:277] failed to renew lease kube-system/cloud-controller-manager: timed out waiting for the condition E1202 20:45:13.082806 8976 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"} I1202 20:45:13.083306 8976 leaderelection.go:277] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition I1202 20:45:13.088083 8976 event.go:278] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"kube-controller-manager", UID:"2b095a6b-cf7c-4767-bbb2-ca5d69073af6", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"6053", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k3s-1_4f23b3d9-b58d-4c79-84bd-854a7d7c6afa stopped leading I1202 20:45:13.088647 8976 leaderelection.go:277] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition I1202 20:45:13.091349 8976 event.go:278] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"cloud-controller-manager", UID:"d10eee0b-8e31-4b7d-92ed-0035622e0c74", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"6052", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k3s-1_b78da5f0-164b-4354-b477-803bd79ff913 stopped leading E1202 20:45:13.094074 8976 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"} F1202 20:45:13.104741 8976 controllermanager.go:213] leaderelection lost I1202 20:45:13.108478 8976 trace.go:116] Trace[955622096]: "Update" url:/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf/leader-election,client:127.0.0.1 (started: 2020-12-02 20:45:01.740110714 +0000 UTC m=+83.873542975) (total time: 11.368219885s): Trace[955622096]: [11.368219885s] [11.36765371s] END F1202 20:45:13.109628 8976 controllermanager.go:279] leaderelection lost I1202 20:45:13.113429 8976 endpointslice_controller.go:229] Shutting down endpoint slice controller I1202 20:45:13.114256 8976 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"", Name:"", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k3s-1_4f23b3d9-b58d-4c79-84bd-854a7d7c6afa stopped leading F1202 20:45:13.114557 8976 server.go:244] leaderelection lost I1202 20:45:13.114653 8976 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"", Name:"", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k3s-1_b78da5f0-164b-4354-b477-803bd79ff913 stopped leading I1202 20:45:13.115929 8976 trace.go:116] Trace[1502416122]: "Update" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf/leader-election,client:127.0.0.1 (started: 2020-12-02 20:45:03.429242772 +0000 UTC m=+85.562675310) (total time: 9.686487495s): Trace[1502416122]: [9.686487495s] [9.684921448s] END I1202 20:45:13.930719 8976 ttl_controller.go:130] Shutting down TTL controller I1202 20:45:13.930902 8976 deployment_controller.go:165] Shutting down deployment controller I1202 20:45:13.930962 8976 pvc_protection_controller.go:113] Shutting down PVC protection controller I1202 20:45:13.930994 8976 daemon_controller.go:300] Shutting down daemon sets controller I1202 20:45:13.931013 8976 expand_controller.go:331] Shutting down expand controller I1202 20:45:13.931113 8976 pv_controller_base.go:311] Shutting down persistent volume controller I1202 20:45:13.931177 8976 replica_set.go:194] Shutting down replicaset controller I1202 20:45:13.931187 8976 gc_controller.go:100] Shutting down GC controller I1202 20:45:13.931333 8976 node_lifecycle_controller.go:593] Shutting down node controller I1202 20:45:13.931394 8976 job_controller.go:157] Shutting down job controller I1202 20:45:13.931514 8976 clusterroleaggregation_controller.go:161] Shutting ```
brandond commented 3 years ago
I1202 20:45:13.065564    8976 trace.go:116] Trace[1118593115]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local-path-provisioner/v0.0.0 (linux/arm64) kubernetes/$Format,client:10.42.0.5 (started: 2020-12-02 20:45:01.181165632 +0000 UTC m=+83.314597930) (total time: 11.884231395s):
Trace[1118593115]: [11.884231395s] [11.883804717s] END
E1202 20:45:11.696999    8976 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager: context deadline exceeded
I1202 20:45:11.697467    8976 trace.go:116] Trace[463620426]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-02 20:45:01.74087159 +0000 UTC m=+83.874303888) (total time: 9.956464373s):
Trace[463620426]: [9.039221883s] [9.036212381s] Transaction committed
Trace[463620426]: [9.956464373s] [915.207378ms] END
E1202 20:45:11.953410    8976 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: context deadline exceeded
I1202 20:45:11.954804    8976 trace.go:116] Trace[22746264]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-02 20:45:01.99803945 +0000 UTC m=+84.131471747) (total time: 9.956183886s):
Trace[22746264]: [9.003243307s] [9.00081496s] Transaction committed
Trace[22746264]: [9.956183886s] [950.672304ms] END
I1202 20:45:12.000414    8976 trace.go:116] Trace[683943658]: "Update" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf/leader-election,client:127.0.0.1 (started: 2020-12-02 20:45:01.997261574 +0000 UTC m=+84.130693872) (total time: 10.002829094s):
Trace[683943658]: [10.002829094s] [10.002248863s] END

Your SD card is unable to support both workload and control-plane IO. It is overloaded to the point where datastore writes are taking > 10 seconds. After 10 seconds many internal Kubernetes components will exceed a hardcoded timeout and fatally terminate the process.

You might try adding some external storage for /var/lib/rancher - USB attached SSD is your best bet.

cwoolum commented 3 years ago

I swapped my SD Cards. I am now using a SanDisk Extreme Plus on the master node which fixed the problem. The agent node is now using the SanDisk Ultra. I didn't realize there was such a huge performance difference but it's very noticeable. Thanks for the help!

cwoolum commented 3 years ago

This was working until I deployed all of my workloads. After deploying the Prometheus Operator, etcd was super overloaded again. I do have an external SSD mapped to /mnt/ssd. If I'm using k3os, will I be able to symlink /var/lib/rancher to the SSD in boot_cmd or are the k3s and k3os folders pre-existing in that folder?

brandond commented 3 years ago

I believe you should be able to do that; give it a try!

cwoolum commented 3 years ago

Unfortunately, I don't think symlinking that folder will work with k3os. I'll see if the k3os team can help me.

I'm a bit surprised that this SD Card isn't fast enough to host this. Based on specs, it seems plenty performant.

https://www.bestbuy.com/site/sandisk-extreme-plus-64gb-microsdxc-uhs-i-memory-card/6282920.p?skuId=6282920

brandond commented 3 years ago

Prometheus really wants SSD. I can't even get away with running it on rotational disk without it DoSing other workloads on the same volume.

cwoolum commented 3 years ago

I just want to confirm that we are talking about storage for etcd and not persistent volume storage for the pods. I am running all of the persistent storage on the ssd already as a nfs share.

brandond commented 3 years ago

Ah, I had missed that you had already split off the workload volumes.

If you're only going to ever run with a single server you might stick with sqlite instead of etcd. etcd is not super high throughput but blocks on a fsync after every write to the datastore. This makes it extremely latency sensitive, and also more likely to burn through the write cycles on your SD card.

cwoolum commented 3 years ago

I currently have 2 Raspberry Pi 4 nodes and wanted to add a 3rd and eventually move to the experimental HA embedded etcd. In the current config, I have one server and one agent.

Do you recommend against going with that approach? I was hoping to avoid setting up a separate Pi just to host a db.

brandond commented 3 years ago

You can certainly try it, but keep in mind that we haven't done any tuning of etcd for low-end systems so all of its normal hardware requirements are still valid: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#disks

cwoolum commented 3 years ago

Okay, I've added a new Raspberry Pi 4 with Postgresql on it as a backing store but Kubernetes is still crashing. The Postgresql database is on an SSD. Here are the logs from the cluster


E1215 22:57:12.954647   26145 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager: context deadline exceeded
I1215 22:57:12.954927   26145 leaderelection.go:277] failed to renew lease kube-system/cloud-controller-manager: timed out waiting for the condition
F1215 22:57:12.955094   26145 controllermanager.go:213] leaderelection lost
time="2020-12-15T22:57:24.334862526Z" level=info msg="Starting k3s v1.18.9+k3s1 (630bebf9)"
time="2020-12-15T22:57:24.338415930Z" level=info msg="Cluster bootstrap already complete"
time="2020-12-15T22:57:24.626516656Z" level=info msg="Kine listening on unix://kine.sock"
time="2020-12-15T22:57:24.630327167Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/rancher/k3s/server/cred/passwd --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-servers=unix://kine.sock --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=k3s --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
I1215 22:57:24.635041   26490 server.go:645] external host was not specified, using 192.168.40.2
I1215 22:57:24.636431   26490 server.go:162] Version: v1.18.9+k3s1
I1215 22:57:24.665884   26490 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1215 22:57:24.666054   26490 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I1215 22:57:24.672931   26490 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1215 22:57:24.673100   26490 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I1215 22:57:24.856223   26490 master.go:270] Using reconciler: lease
I1215 22:57:25.198710   26490 rest.go:113] the default service ipfamily for this cluster is: IPv4
I1215 22:57:25.782205   26490 trace.go:116] Trace[1045366144]: "List etcd3" key:/pods,resourceVersion:,limit:10000,continue: (started: 2020-12-15 22:57:25.11095305 +0000 UTC m=+2.088864669) (total time: 671.087107ms):
Trace[1045366144]: [671.087107ms] [671.087107ms] END

These are the parameters I'm passing to k3s

server --datastore-endpoint=postgres://k8s:redacted@192.168.40.1/k3s --disable=traefik,servicelb
brandond commented 3 years ago

Can you attach the full logs? There's not really enough there to see what's going on before or after the restart.

cwoolum commented 3 years ago

Here are the last few hours. Let me know if you need more. The full log file was 96 MB so I needed to slim it down.

k3s-service.log

cwoolum commented 3 years ago

I'm seeing things related to etcd3 in the logs. Is that just a labeling issue? Etcd shouldn't actually be used in this configuration, should it?

brandond commented 3 years ago

Thanks, I'll take a look at the logs. Kine is still etcd as far as Kubernetes is concerned.

brandond commented 3 years ago

Is there any reason you're still on 1.18.9? There have been quite a few changes since then, including some that improve performance of the kine storage engine. If you're running everything on one host, I would actually recommend sticking with sqlite as opposed to adding the overhead of postgres or mysql.

It looks like your system is still having trouble keeping up with datastore operations. Can you share any more info on how you have your system partitioned? What is on the SSD, what is on the SD card? What does system load and IOPS look like when this occurs?

I1215 22:46:39.910212   28937 trace.go:116] Trace[44432836]: "Get" url:/api/v1/namespaces/kube-system/configmaps/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.40.3 (started: 2020-12-15 22:46:38.222428524 +0000 UTC m=+4212.259529000) (total time: 1.687353158s):
Trace[44432836]: [1.685838702s] [1.685789333s] About to write a response
E1215 22:46:40.431953   28937 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s: context deadline exceeded
I1215 22:46:40.433098   28937 leaderelection.go:277] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F1215 22:46:40.433422   28937 server.go:244] leaderelection lost
E1215 22:46:47.083523   28937 controller.go:178] failed to update node lease, error: Put https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k3s-1?timeout=10s: context deadline exceeded
E1215 22:46:40.790278   28937 leaderelection.go:356] Failed to update lock: Put https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager: context deadline exceeded
I1215 22:46:48.487898   28937 leaderelection.go:277] failed to renew lease kube-system/cloud-controller-manager: timed out waiting for the condition
F1215 22:46:48.488397   28937 controllermanager.go:213] leaderelection lost
I1215 22:46:41.039250   28937 trace.go:116] Trace[1401285353]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-12-15 22:46:37.687045851 +0000 UTC m=+4211.724146383) (total time: 3.351867763s):
Trace[1401285353]: [3.351867763s] [3.347675533s] END
I1215 22:46:41.067207   28937 trace.go:116] Trace[1829012881]: "Get" url:/api/v1/namespaces/default,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf,client:127.0.0.1 (started: 2020-12-15 22:46:38.627750808 +0000 UTC m=+4212.664851321) (total time: 2.43922889s):
Trace[1829012881]: [2.425721222s] [2.425604003s] About to write a response
I1215 22:46:41.418948   28937 trace.go:116] Trace[1320012864]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-12-15 22:46:37.729888843 +0000 UTC m=+4211.766989411) (total time: 3.688684122s):
Trace[1320012864]: [3.688684122s] [3.68536157s] END
I1215 22:46:41.701868   28937 trace.go:116] Trace[850776731]: "List etcd3" key:/pods,resourceVersion:,limit:0,continue: (started: 2020-12-15 22:46:36.064168255 +0000 UTC m=+4210.101268768) (total time: 5.637501404s):
Trace[850776731]: [5.637501404s] [5.637501404s] END
I1215 22:46:41.735352   28937 trace.go:116] Trace[1236525513]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (started: 2020-12-15 22:46:37.664163445 +0000 UTC m=+4211.701263976) (total time: 4.070898516s):
Trace[1236525513]: [4.070701595s] [4.067042405s] Transaction committed
I1215 22:46:41.821225   28937 trace.go:116] Trace[259273813]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-12-15 22:46:37.264137407 +0000 UTC m=+4211.301237920) (total time: 4.556880902s):
Trace[259273813]: [304.957755ms] [304.693244ms] Transaction prepared
Trace[259273813]: [4.556569208s] [4.251611453s] Transaction committed
I1215 22:46:44.727003   28937 trace.go:116] Trace[497163353]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-15 22:46:39.887678081 +0000 UTC m=+4213.924778613) (total time: 4.839078762s):
Trace[497163353]: [293.211632ms] [293.074655ms] Transaction prepared
Trace[497163353]: [4.838910082s] [4.54569845s] Transaction committed
E1215 22:46:44.945545   28937 leaderelection.go:356] Failed to update lock: resource name may not be empty
I1215 22:46:45.016542   28937 trace.go:116] Trace[1609514022]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (started: 2020-12-15 22:46:40.24842112 +0000 UTC m=+4214.285521652) (total time: 4.767691527s):
Trace[1609514022]: [547.604764ms] [547.193516ms] Transaction prepared
Trace[1609514022]: [4.767459681s] [4.219854917s] Transaction committed
E1215 22:46:45.107452   28937 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1215 22:46:45.294639   28937 trace.go:116] Trace[1494008030]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (started: 2020-12-15 22:46:39.919424496 +0000 UTC m=+4213.956525028) (total time: 5.374877449s):
Trace[1494008030]: [5.374357612s] [5.369947795s] Transaction committed
I1215 22:46:45.301265   28937 trace.go:116] Trace[902924431]: "Get" url:/apis/apps/v1/namespaces/flux-system/replicasets/source-controller-f8fdb486,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf/system:serviceaccount:kube-system:replicaset-controller,client:127.0.0.1 (started: 2020-12-15 22:46:38.985781817 +0000 UTC m=+4213.022882274) (total time: 6.315069611s):
Trace[902924431]: [6.314190784s] [6.314155062s] About to write a response
I1215 22:46:47.108312   28937 trace.go:116] Trace[1561342929]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k3s-1,user-agent:k3s/v1.18.9+k3s1 (linux/arm64) kubernetes/630bebf,client:127.0.0.1 (started: 2020-12-15 22:46:36.898393326 +0000 UTC m=+4210.935493802) (total time: 10.209653714s):
Trace[1561342929]: [10.209653714s] [10.208249532s] END
I1215 22:46:47.122337   28937 trace.go:116] Trace[324341722]: "List etcd3" key:/minions,resourceVersion:,limit:0,continue: (started: 2020-12-15 22:46:42.330160539 +0000 UTC m=+4216.367261052) (total time: 4.791889517s):
Trace[324341722]: [4.791889517s] [4.791889517s] END
I1215 22:46:47.123968   28937 trace.go:116] Trace[1595965141]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-12-15 22:46:39.261320827 +0000 UTC m=+4213.298421340) (total time: 7.862508086s):
Trace[1595965141]: [7.862508086s] [7.85862568s] END
cwoolum commented 3 years ago

I now have 3 Raspberry PI's

Pi 1:

Pi 2:

Pi 3:

I moved the SSD to a separate Raspberry PI now that Postgresql runs on. The k3s server and agent now run exclusively on microsd cards but all pod persistent storage happens on NFS shared from the first PI. I'm not sure how to test IOPS on k3os?

I have automatic updates enabled but the system keeps boot looping before the the updates can occur

viceice commented 3 years ago

Seeing this too on ubuntu focal vm on windows hyper-v 2019 servers. Using v1.20.0+k3s2

Having one db vm with 2GB ram running mariadb (~30% usage), one master node with 4GB ram. Both are baked by ssd.

Jan 13 12:35:19 docker-m02 k3s[1459387]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:975 +0x19b
Jan 13 12:35:19 docker-m02 k3s[1459387]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).output(0x701b3e0, 0xc000000003, 0x0, 0x0, 0xc00029d0a0, 0x6ca6996, 0x9, 0xcd, 0x0)
Jan 13 12:35:19 docker-m02 k3s[1459387]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
Jan 13 12:35:19 docker-m02 k3s[1459387]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.stacks(0xc000010001, 0xc00c35e280, 0x41, 0x123)
Jan 13 12:35:19 docker-m02 k3s[1459387]: goroutine 3602 [running]:
Jan 13 12:35:19 docker-m02 k3s[1459387]: F0113 12:35:19.502155 1459387 server.go:205] leaderelection lost
Jan 13 12:35:19 docker-m02 k3s[1459387]: I0113 12:35:19.502130 1459387 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
Jan 13 12:35:19 docker-m02 k3s[1459387]: E0113 12:35:19.502006 1459387 leaderelection.go:361] Failed to update lock: Put "https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/>
Jan 13 12:35:19 docker-m02 k3s[1459387]: Trace[1873575128]: [9.994547965s] [9.994547965s] END
Jan 13 12:35:19 docker-m02 k3s[1459387]: I0113 12:35:19.501182 1459387 trace.go:205] Trace[1873575128]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler,user->
Jan 13 12:35:19 docker-m02 k3s[1459387]: time="2021-01-13T12:35:19.501085416+01:00" level=error msg="error in txn: context canceled"
Jan 13 12:35:19 docker-m02 k3s[1459387]: E0113 12:35:19.500948 1459387 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 13 12:35:19 docker-m02 k3s[1459387]: Trace[1419650026]: [9.994151447s] [9.994151447s] END
Jan 13 12:35:19 docker-m02 k3s[1459387]: I0113 12:35:19.500885 1459387 trace.go:205] Trace[1419650026]: "GuaranteedUpdate etcd3" type:*coordination.Lease (13-Jan-2021 12:35:09.506) (total time: 99>
Jan 13 12:35:19 docker-m02 k3s[1459387]: E0113 12:35:19.258422 1459387 controller.go:187] failed to update lease, error: Put "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-nod>
Jan 13 12:35:19 docker-m02 k3s[1459387]: Trace[588737688]: [9.998288231s] [9.998288231s] END
Jan 13 12:35:19 docker-m02 k3s[1459387]: I0113 12:35:19.257152 1459387 trace.go:205] Trace[588737688]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/docker-m02,user-a>
Jan 13 12:35:19 docker-m02 k3s[1459387]: time="2021-01-13T12:35:19.256921672+01:00" level=error msg="error in txn: context canceled"
Jan 13 12:35:19 docker-m02 k3s[1459387]: E0113 12:35:19.256782 1459387 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 13 12:35:19 docker-m02 k3s[1459387]: Trace[412436448]: [9.9975974s] [9.9975974s] END
Jan 13 12:35:19 docker-m02 k3s[1459387]: I0113 12:35:19.256689 1459387 trace.go:205] Trace[412436448]: "GuaranteedUpdate etcd3" type:*coordination.Lease (13-Jan-2021 12:35:09.259) (total time: 999>
Jan 13 12:35:18 docker-m02 k3s[1459387]: time="2021-01-13T12:35:18.676181581+01:00" level=error msg="error in txn: context canceled"
Jan 13 12:35:18 docker-m02 k3s[1459387]: Trace[586832888]: [9.992906492s] [9.992906492s] END
Jan 13 12:35:18 docker-m02 k3s[1459387]: I0113 12:35:18.676090 1459387 trace.go:205] Trace[586832888]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local->
Jan 13 12:35:18 docker-m02 k3s[1459387]: E0113 12:35:18.675719 1459387 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}

Seeing a constand write rate on db node of ~4MB/s

Warnings from mariadb server: image

brandond commented 3 years ago

@viceice your datastore operation times are too slow, far slower than I would expect to see from SSD:

Trace[1873575128]: [9.994547965s] [9.994547965s] END
viceice commented 3 years ago

@brandond yes, that's very strange. There are 6 sata3 ssd with ~450mb/s read-write capacity. On synthetic writes i can see ~2gb/s write io.

I think i need to make some iops tests.

BTW: i have some more vm's running on that server. I've two more k3s single master nodes running. One uses integrated etcd and one test cluster with sqlite.

Sqlite shows best performance, but isn't clusterable.

Are there any comparison between mysql and postgresql? Which performs better for 3 k3s cluster with around 4 master and 8 worker nodes?

I'll check running the db on pcie ssd next week.

viceice commented 3 years ago

Very strange, far to low iops for a 6x 1TB ssd storage space: 🤔

# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=41.8MiB/s,w=13.8MiB/s][r=10.7k,w=3537 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=30295: Fri Jan 15 11:24:54 2021
  read: IOPS=9562, BW=37.4MiB/s (39.2MB/s)(3070MiB/82184msec)
   bw (  KiB/s): min=  552, max=62944, per=100.00%, avg=38740.52, stdev=12204.15, samples=162
   iops        : min=  138, max=15736, avg=9685.11, stdev=3051.03, samples=162
  write: IOPS=3195, BW=12.5MiB/s (13.1MB/s)(1026MiB/82184msec); 0 zone resets
   bw (  KiB/s): min=  112, max=20816, per=100.00%, avg=12946.37, stdev=4068.42, samples=162
   iops        : min=   28, max= 5204, avg=3236.58, stdev=1017.10, samples=162
  cpu          : usr=3.00%, sys=15.94%, ctx=346505, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=37.4MiB/s (39.2MB/s), 37.4MiB/s-37.4MiB/s (39.2MB/s-39.2MB/s), io=3070MiB (3219MB), run=82184-82184msec
  WRITE: bw=12.5MiB/s (13.1MB/s), 12.5MiB/s-12.5MiB/s (13.1MB/s-13.1MB/s), io=1026MiB (1076MB), run=82184-82184msec

Disk stats (read/write):
  sda: ios=785584/262620, merge=31/117, ticks=2129386/3081022, in_queue=3775352, util=99.96%

Here a vm on a single pcie-ssd, which is more what i expect.

# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=272MiB/s,w=90.6MiB/s][r=69.5k,w=23.2k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1799555: Fri Jan 15 11:23:06 2021
  read: IOPS=68.6k, BW=268MiB/s (281MB/s)(3070MiB/11450msec)
   bw (  KiB/s): min=246584, max=281480, per=99.95%, avg=274421.82, stdev=8581.23, samples=22
   iops        : min=61646, max=70370, avg=68605.45, stdev=2145.31, samples=22
  write: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(1026MiB/11450msec); 0 zone resets
   bw (  KiB/s): min=84144, max=94408, per=100.00%, avg=91797.82, stdev=2451.21, samples=22
   iops        : min=21036, max=23602, avg=22949.45, stdev=612.80, samples=22
  cpu          : usr=9.76%, sys=43.08%, ctx=114862, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=3070MiB (3219MB), run=11450-11450msec
  WRITE: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s), io=1026MiB (1076MB), run=11450-11450msec

Disk stats (read/write):
  sda: ios=784233/262068, merge=47/0, ticks=634898/48918, in_queue=2044, util=99.24%

This looks really bad, made on physical host.

------------------------------------------------------------------------------
CrystalDiskMark 8.0.0 x64 (C) 2007-2020 hiyohiyo
                                  Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
  SEQ    1MiB (Q=  8, T= 1):  1059.225 MB/s [   1010.2 IOPS] <  7871.99 us>
  SEQ    1MiB (Q=  1, T= 1):   320.652 MB/s [    305.8 IOPS] <  3266.88 us>
  RND    4KiB (Q= 32, T= 1):   116.168 MB/s [  28361.3 IOPS] <  1125.77 us>
  RND    4KiB (Q=  1, T= 1):     7.341 MB/s [   1792.2 IOPS] <   555.41 us>

[Write]
  SEQ    1MiB (Q=  8, T= 1):   389.238 MB/s [    371.2 IOPS] < 21290.93 us>
  SEQ    1MiB (Q=  1, T= 1):    54.114 MB/s [     51.6 IOPS] < 18817.60 us>
  RND    4KiB (Q= 32, T= 1):    19.561 MB/s [   4775.6 IOPS] <  6686.48 us>
  RND    4KiB (Q=  1, T= 1):     1.273 MB/s [    310.8 IOPS] <  3211.28 us>

Profile: Default
   Test: 1 GiB (x5) [E: 86% (2195/2560GiB)]
   Mode: [Admin]
   Time: Measure 5 sec / Interval 5 sec 
   Date: 2021/01/15 11:46:23
     OS: Windows Server 2019 Datacenter (Full installation) [10.0 Build 17763] (x64)
viceice commented 3 years ago

Looks like the problem are to full disks. 4 of 6 1Tb ssd have only 80GB free space. 🙈

Ordered two more ssd's now and moveing some vms to get more free space.

btw: another server with 8HDD

------------------------------------------------------------------------------
CrystalDiskMark 8.0.0 x64 (C) 2007-2020 hiyohiyo
                                  Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
  SEQ    1MiB (Q=  8, T= 1):    89.761 MB/s [     85.6 IOPS] < 92752.58 us>
  SEQ    1MiB (Q=  1, T= 1):    36.073 MB/s [     34.4 IOPS] < 28983.60 us>
  RND    4KiB (Q= 32, T= 1):     2.555 MB/s [    623.8 IOPS] < 50029.35 us>
  RND    4KiB (Q=  1, T= 1):     0.506 MB/s [    123.5 IOPS] <  8020.01 us>

[Write]
  SEQ    1MiB (Q=  8, T= 1):    30.824 MB/s [     29.4 IOPS] <256822.78 us>
  SEQ    1MiB (Q=  1, T= 1):    16.147 MB/s [     15.4 IOPS] < 64199.93 us>
  RND    4KiB (Q= 32, T= 1):     0.593 MB/s [    144.8 IOPS] <217217.98 us>
  RND    4KiB (Q=  1, T= 1):     0.352 MB/s [     85.9 IOPS] < 11552.70 us>

Profile: Default
   Test: 1 GiB (x5) [D: 66% (1694/2560GiB)]
   Mode: [Admin]
   Time: Measure 5 sec / Interval 5 sec 
   Date: 2021/01/15 12:39:15
     OS: Windows Server 2019 Datacenter (Full installation) [10.0 Build 17763] (x64)
stale[bot] commented 3 years ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.