k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.62k stars 2.24k forks source link

[Release-1.27] - etcd-snapshot save times out in 10 seconds the first try #9999

Closed brandond closed 3 weeks ago

brandond commented 3 weeks ago

Backport fix for etcd-snapshot save times out in 10 seconds the first try

aganesh-suse commented 3 weeks ago

Validated on release-1.27 branch with commit b721a3e05d09f9c59dbc78dfd67f3fb01ecf0eca

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

Testing Steps

  1. Copy config.yaml
    $ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  2. Install k3s
    curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='b721a3e05d09f9c59dbc78dfd67f3fb01ecf0eca' sh -s - server
  3. Verify Cluster Status:
    kubectl get nodes -o wide
    kubectl get pods -A
  4. Perform etcd-snapshot save with s3 details provided:
    $ sudo /usr/local/bin/k3s etcd-snapshot save --s3 --s3-bucket=<bucket> --s3-region=<region> --s3-access-key=xxxx --s3-secret-key="xxxx" --debug 

    Expected Behavior: etcd snapshot save action should be successful and not timeout in 10 seconds.

Validation Results:

$ sudo /usr/local/bin/k3s etcd-snapshot save --s3 --s3-bucket=<s3-bucket> --s3-region=<s3-region> --s3-access-key=xxxx --s3-secret-key="xxxx" --debug
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --cluster-init found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --write-kubeconfig-mode found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --node-external-ip found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --node-label found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
time="2024-04-22T20:20:19Z" level=info msg="Snapshot on-demand-ip-172-31-16-180-1713817190 saved."
time="2024-04-22T20:20:19Z" level=info msg="Snapshot on-demand-ip-172-31-16-180-1713817190 saved."

As we can see from log timings above, the save did not timeout in 10 seconds. It waits for the save completion and the save is successful. Closing the bug.