k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.25k stars 2.36k forks source link

Allow for tuning various K3s parameters regarding Etcd timeout/leaderelection and document k3s-cloud-controller-manager parameters #8956

Closed nshores closed 1 year ago

nshores commented 1 year ago

Is your feature request related to a problem? Please describe.

Often times K3s is running on systems that are at times, resource constrained. There appears to be a process running in the K3S binary that watches for LeaderElection failures in both etcd and k3s-cloud-controller-manager. After a short period (<500ms) of a leader election failure of either loop, the main k3s process exits with error code 1.

Based on the documentation, the values/parameters for kube-controller-manager can be modified via inserting them into the config, but not etcd parameters. I believe we need to be able to tweak these values (To allow for slow disk hiccups), and more clearly document them. Both of these subsystems have very tight defaults, and should be able to tolerate more latency - especially in small deployments.

This appears to be a known issue based on comments like this

Background:

https://kubernetes.io/docs/concepts/architecture/leases/#leader-election https://etcd.io/docs/v3.4/tuning/#time-parameters

The default for etcd is as follows:

100ms heartbeat interval.
1000ms election timeout.

The default for kube-controller-manager is as follows (I am not sure these are the right values):

--leader-elect-lease-duration duration     Default: 15s
--leader-elect-renew-deadline duration     Default: 10s

Describe the solution you'd like

The ability to tweak etcd default values and clear documentation on tuning kube-controller-manager values for slow disks. Ideally, I would like K3s to simply ignore these events and have a high ceiling for tolerance of leader election failures (>60s)

Additional context

This is a blip in a lower-quality SSD resulting in the below errors. This type of minor disk latency should be tolerated.

image

Logs during the event:

Nov 27 11:44:05 k3s-master-5 k3s[252355]: E1127 11:44:04.285255  252355 timeout.go:141] post-timeout activity - time-elapsed: 3.692786ms, GET "/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler" result: <nil>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:04.327-0800","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":2613648160407942655,"retry-timeout":"500>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:04.411-0800","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"8.339549514s","expected-duration":"1s"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:04.411-0800","caller":"etcdserver/raft.go:369","msg":"leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk","to":"81d10acbe>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:04.411-0800","caller":"etcdserver/raft.go:369","msg":"leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk","to":"52e5c0c45>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:04.485-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 [logterm: 48, index: 278498192, vote: 2f847f91dcb22445] ignored MsgVote from 52e5c0>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:04.485-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 [term: 48] received a MsgAppResp message with higher term from 52e5c0c45975de62 [te>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:04.507-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 became follower at term 49"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:04.526-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 2f847f91dcb22445 lost leader 2f847f91dcb22445 at term 49"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:04.827-0800","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":2613648160407942655,"retry-timeout":"500>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:04.827-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:04.913-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.327-0800","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":2613648160407942655,"retry-timeout":"500>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.327-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.414-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.827-0800","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":2613648160407942655,"retry-timeout":"500>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.827-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.909-0800","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-11-27T11:44:04.827-0800","time spent":"1.081820832s","remote":"127.0.0.1:37802","re>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.909-0800","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.440143399s","expected-duration":"1s"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.909-0800","caller":"traceutil/trace.go:171","msg":"trace[55245296] linearizableReadLoop","detail":"{readStateIndex:278498187; appliedIndex:278498187; }","duration":"9.821>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.909-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.909-0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"9.821930507s","expected-duration":"100ms","prefix":"read-only range ","request":"k>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.909-0800","caller":"traceutil/trace.go:171","msg":"trace[63906582] range","detail":"{range_begin:/registry/configmaps/kube-system/k3s; range_end:; response_count:1; respo>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.909-0800","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-11-27T11:43:56.087-0800","time spent":"9.82199592s","remote":"127.0.0.1:34690","res>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.910-0800","caller":"traceutil/trace.go:171","msg":"trace[883236454] transaction","detail":"{read_only:false; response_revision:249656966; number_of_response:1; }","durati>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: I1127 11:44:05.910148  252355 trace.go:219] Trace[784261243]: "Get" url:/api/v1/namespaces/kube-system/configmaps/k3s,user-agent:k3s-supervisor@k3s-master-5/v1.25.13+k3s1 (linux/amd64) k3s/8fcbc2bc,audit-id:cb0d6>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[784261243]: ---"About to write a response" 9822ms (11:44:05.910)
Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[784261243]: [9.8224918s] [9.8224918s] END
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:05.914-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"2f847f91dcb22445 no leader at term 49; dropping index reading msg"}
Nov 27 11:44:05 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:05.916-0800","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-11-27T11:44:03.316-0800","time spent":"2.59358597s","remote":"127.0.0.1:34820","res>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: I1127 11:44:05.916637  252355 trace.go:219] Trace[1848983765]: "GuaranteedUpdate etcd3" audit-id:a08ef7cf-16bf-4531-a851-92b3cb3d7157,key:/leases/kube-node-lease/k3s-master-5,type:*coordination.Lease (27-Nov-2023>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[1848983765]: ---"Txn call finished" err:<nil> 2600ms (11:44:05.916)
Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[1848983765]: [2.600692151s] [2.600692151s] END
Nov 27 11:44:05 k3s-master-5 k3s[252355]: I1127 11:44:05.916737  252355 trace.go:219] Trace[245797042]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k3s-master-5,user-agent:k3s/v1.25.13+k3s1 (linux/amd64) kubernetes/8fcbc2b>
Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[245797042]: ---"Write to database call finished" len:489,err:<nil> 2600ms (11:44:05.916)
Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[245797042]: [2.600867193s] [2.600867193s] END
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.069641  252355 leaderelection.go:330] error retrieving resource lock kube-system/k3s-cloud-controller-manager: Get "https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:06.069-0800","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00474a380/127.0.>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.069800  252355 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.069835  252355 writers.go:118] apiserver was unable to write a JSON response: http: Handler timeout
Nov 27 11:44:06 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:06.069-0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.773428293s","expected-duration":"100ms","prefix":"read-only range ","request":"k>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: {"level":"info","ts":"2023-11-27T11:44:06.069-0800","caller":"traceutil/trace.go:171","msg":"trace[512349716] range","detail":"{range_begin:/registry/leases/kube-system/k3s-cloud-controller-manager; range_end:; }>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: {"level":"warn","ts":"2023-11-27T11:44:06.069-0800","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-11-27T11:44:03.296-0800","time spent":"2.773512927s","remote":"127.0.0.1:34820","re>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.070968  252355 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.072080  252355 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
Nov 27 11:44:06 k3s-master-5 k3s[252355]: I1127 11:44:06.073220  252355 trace.go:219] Trace[1924033328]: "Get" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/k3s-cloud-controller-manager,user-agent:k3s/v1.25.13+k3s1 (linux/amd64) kubernet>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: Trace[1924033328]: [2.777117809s] [2.777117809s] END
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.073315  252355 timeout.go:141] post-timeout activity - time-elapsed: 3.545924ms, GET "/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/k3s-cloud-controller-manager" result: <nil>
Nov 27 11:44:06 k3s-master-5 k3s[252355]: I1127 11:44:06.079874  252355 leaderelection.go:283] failed to renew lease kube-system/k3s-cloud-controller-manager: timed out waiting for the condition
Nov 27 11:44:06 k3s-master-5 k3s[252355]: I1127 11:44:06.095743  252355 event.go:294] "Event occurred" object="" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="k3s-master-5_2985baa5-22a0-4871->
Nov 27 11:44:06 k3s-master-5 k3s[252355]: E1127 11:44:06.096240  252355 controllermanager.go:237] "leaderelection lost"
Nov 27 11:44:10 k3s-master-5 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Nov 27 11:44:10 k3s-master-5 systemd[1]: k3s.service: Failed with result 'exit-code'.

Related issues: https://github.com/k3s-io/k3s/issues/2615 https://github.com/k3s-io/k3s/issues/5587 https://github.com/k3s-io/k3s/issues/5587#issuecomment-1134375622

brandond commented 1 year ago

You seem to be confusing/conflating multiple things. The core Kubernetes components have leader election timeouts that relate to how often leases are renewed for leader-elected controllers. Etcd also has leader-election and heartbeat timeouts that have to do with the raft consensus algorithm itself.

The kubernetes options can be configured via --kube-controller-manager-arg, --kube-scheduler-arg and so on. The etcd options can be configured via --etcd-arg.

Note that we don't document in this project the possible values for those flags; you should refer to the upstream Kubernetes and etcd docs for valid CLI flags.

This is a blip in a lower-quality SSD resulting in the below errors. This type of minor disk latency should be tolerated. Nov 27 11:44:05 k3s-master-5 k3s[252355]: Trace[784261243]: [9.8224918s] [9.8224918s] END

I think that your monitoring data is obscuring the true severity of your IO stall. It shows a 456ms average latency over that period, but the max was clearly much longer, as the log shows that some operations took over 10 seconds to complete - which is definitely longer than Kubernetes is willing to wait for a lease renewal operation to complete.