k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.04k stars 2.35k forks source link

Panic when using `k3s etcd-snapshot save` if s3 access is denied #8918

Closed maggie44 closed 11 months ago

maggie44 commented 11 months ago

I am seeing a panic at midnight every night that is making my cluster fall over. I can reproduce the panic by running k3s etcd-snapshot save. Still trying to narrow down the cause and reproduce in different environments, will be sure to keep the ticket up to date and any input or ideas welcome.

Environmental Info: K3s Version:

k3s version v1.27.7+k3s2 (575bce76)
go version go1.20.10

Node(s) CPU architecture, OS, and Version:

Linux control-plane-fsn1-ayc 6.3.9-1-default #1 SMP PREEMPT_DYNAMIC Thu Jun 22 03:53:43 UTC 2023 (0df701d) aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration: Single control plane being used as a control plane and node for development

Describe the bug:

control-plane-fsn1-ayc:~ # k3s etcd-snapshot save
INFO[0000] Saving etcd snapshot to /var/lib/rancher/k3s/server/db/snapshots/on-demand-control-plane-fsn1-ayc-1700567221 
{"level":"info","ts":"2023-11-21T11:47:00.811857Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-control-plane-fsn1-ayc-1700567221.part"}
{"level":"info","ts":"2023-11-21T11:47:00.817024Z","logger":"client","caller":"v3@v3.5.9-k3s1/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-11-21T11:47:00.817089Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2023-11-21T11:47:00.903724Z","logger":"client","caller":"v3@v3.5.9-k3s1/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-11-21T11:47:00.915494Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"18 MB","took":"now"}
{"level":"info","ts":"2023-11-21T11:47:00.915634Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-control-plane-fsn1-ayc-1700567221"}
INFO[0000] Checking if S3 bucket k3-staging-etcd exists 
WARN[0000] Unable to initialize S3 client: Access Denied. 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3a8d024]

goroutine 1 [running]:
github.com/k3s-io/k3s/pkg/etcd.(*S3).snapshotRetention(0x4000e0f349?, {0x5da20f0?, 0x4000584640?})
        /go/src/github.com/k3s-io/k3s/pkg/etcd/s3.go:284 +0x34
github.com/k3s-io/k3s/pkg/etcd.(*ETCD).Snapshot(0x4000460f00, {0x5da20f0, 0x4000584640})
        /go/src/github.com/k3s-io/k3s/pkg/etcd/snapshot.go:375 +0xe48
github.com/k3s-io/k3s/pkg/cli/etcdsnapshot.save(0x4000762580, 0x0?)
        /go/src/github.com/k3s-io/k3s/pkg/cli/etcdsnapshot/etcd_snapshot.go:121 +0x84
github.com/k3s-io/k3s/pkg/cli/etcdsnapshot.Save(0x4000874b40?)
        /go/src/github.com/k3s-io/k3s/pkg/cli/etcdsnapshot/etcd_snapshot.go:104 +0x40
github.com/urfave/cli.HandleAction({0x4728a40?, 0x563a300?}, 0x4?)
        /go/pkg/mod/github.com/urfave/cli@v1.22.14/app.go:524 +0x58
github.com/urfave/cli.Command.Run({{0x51b3753, 0x4}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x5261520, 0x22}, {0x0, ...}, ...}, ...)
        /go/pkg/mod/github.com/urfave/cli@v1.22.14/command.go:175 +0x50c
github.com/urfave/cli.(*App).RunAsSubcommand(0x4000837dc0, 0x4000566160)
        /go/pkg/mod/github.com/urfave/cli@v1.22.14/app.go:405 +0xa68
github.com/urfave/cli.Command.startApp({{0x51d6176, 0xd}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...)
        /go/pkg/mod/github.com/urfave/cli@v1.22.14/command.go:380 +0x9c4
github.com/urfave/cli.Command.Run({{0x51d6176, 0xd}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...)
        /go/pkg/mod/github.com/urfave/cli@v1.22.14/command.go:103 +0x658
github.com/urfave/cli.(*App).Run(0x4000837c00, {0x40005979e0, 0x9, 0x9})
        /go/pkg/mod/github.com/urfave/cli@v1.22.14/app.go:277 +0x7e4
main.main()
        /go/src/github.com/k3s-io/k3s/cmd/server/main.go:80 +0xa18

Relevant config file entries:

"etcd-s3": "true"
"etcd-s3-access-key": "xxxxx"
"etcd-s3-bucket": "k3-staging-etcd"
"etcd-s3-endpoint": "xxxx.r2.cloudflarestorage.com"
"etcd-s3-secret-key": "xxxx"
brandond commented 11 months ago

Seems pretty cut and dried:

INFO[0000] Checking if S3 bucket k3-staging-etcd exists WARN[0000] Unable to initialize S3 client: Access Denied.

Ideally we wouldn't crash if we do not have valid credentials or the correct permissions, but it seems like something that you can work around easily enough for now by correcting your configuration.

maggie44 commented 11 months ago

Absolutely, I am not worried about the access denied issue, just the panic.

brandond commented 11 months ago

You mentioned you were still trying to narrow down the cause, I don't think there's any mystery there - access denied error is causing the panic because we don't handle it properly.

Until we address the panic, fix the credentials or disable s3.

maggie44 commented 11 months ago

You mentioned you were still trying to narrow down the cause, I don't think there's any mystery there - access denied error is causing the panic because we don't handle it properly.

Until we address the panic, fix the credentials or disable s3.

I meant the cause of the panic (i.e. why the issue isn't handled), not the scenario that leads to a panic. But seems there is no need, I'm not used to such snappy responses. Happy to leave you to it and will just get back to fixing my credentials.

Thanks.

brandond commented 11 months ago

This can also be reproduced just by setting --etcd-s3-insecure=true to use http when the endpoint is using https, or vice versa. Basically any failure to validate access to the bucket.

WARN[0004] Unable to initialize S3 client: Head "https://localhost:9090/test/": http: server gave HTTP response to HTTPS client 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x41f4339]
brandond commented 11 months ago
root@k3s-server-1:/# k3s etcd-snapshot save --s3 --s3-endpoint=s3.example.com --s3-access-key=k3s --s3-secret-key=invalid --s3-bucket=invalid
INFO[0000] Saving etcd snapshot to /var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1700603492
{"level":"info","ts":"2023-11-21T21:51:31.835391Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1700603492.part"}
{"level":"info","ts":"2023-11-21T21:51:31.837266Z","logger":"client","caller":"v3@v3.5.9-k3s1/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-11-21T21:51:31.837328Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2023-11-21T21:51:31.854271Z","logger":"client","caller":"v3@v3.5.9-k3s1/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-11-21T21:51:31.863942Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"3.3 MB","took":"now"}
{"level":"info","ts":"2023-11-21T21:51:31.864014Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1700603492"}
INFO[0000] Checking if S3 bucket invalid exists
WARN[0000] Unable to initialize S3 client: Access Denied.
INFO[0000] Reconciling ETCDSnapshotFile resources
INFO[0000] Checking if S3 bucket invalid exists
WARN[0000] Unable to initialize S3 client: Access Denied.
INFO[0000] Reconciliation of ETCDSnapshotFile resources complete
FATA[0000] Access Denied.

root@k3s-server-1:/# kubectl get etcdsnapshotfile s3-on-demand-k3s-server-1-1700603492-41242b -o yaml
apiVersion: k3s.cattle.io/v1
kind: ETCDSnapshotFile
metadata:
  creationTimestamp: "2023-11-21T21:51:31Z"
  finalizers:
  - wrangler.cattle.io/managed-etcd-snapshots-controller
  generation: 1
  labels:
    etcd.k3s.cattle.io/snapshot-storage-node: s3
  name: s3-on-demand-k3s-server-1-1700603492-41242b
  resourceVersion: "1553"
  uid: 2c2f54d4-f0db-400c-82b6-831c4908059a
spec:
  location: ""
  nodeName: s3
  s3:
    bucket: invalid
    endpoint: s3.example.com
    region: us-east-1
  snapshotName: on-demand-k3s-server-1-1700603492
status:
  creationTime: "2023-11-21T21:51:32Z"
  error:
    message: Access Denied.
    time: "2023-11-21T21:51:32Z"
  readyToUse: false
  size: "0"
mdrahman-suse commented 11 months ago

Validated on v1.28.4-rc1+k3s1

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

Ubuntu 22.04

Cluster Configuration:

1 server

Config.yaml:

write-kubeconfig-mode: 644
token: <token>
cluster-init: true
node-name: server1

Testing Steps

  1. Copy config.yaml
    $ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  2. Install k3s
  3. Perform k3s etcd snapshot save on s3 with s3 prop as invalid data
    sudo k3s etcd-snapshot save   --s3    --s3-endpoint="invalid"  --s3-bucket="invalid"   --s3-folder="invalid"   --s3-access-key="invalid"    --s3-secret-key="invalid"    --s3-region="invalid"
  4. Ensure the error is handled accordingly

Replication Results:

goroutine 1 [running]: github.com/k3s-io/k3s/pkg/etcd.(S3).snapshotRetention(0xc00121a349?, {0x6565e08?, 0xc000a09770?}) /go/src/github.com/k3s-io/k3s/pkg/etcd/s3.go:284 +0x59 github.com/k3s-io/k3s/pkg/etcd.(ETCD).Snapshot(0xc000a097c0, {0x6565e08, 0xc000a09770}) /go/src/github.com/k3s-io/k3s/pkg/etcd/snapshot.go:375 +0x13ca github.com/k3s-io/k3s/pkg/cli/etcdsnapshot.save(0xc00088f340, 0xc000c1d970?) /go/src/github.com/k3s-io/k3s/pkg/cli/etcdsnapshot/etcd_snapshot.go:121 +0x92 github.com/k3s-io/k3s/pkg/cli/etcdsnapshot.Save(0xc00088f340?) /go/src/github.com/k3s-io/k3s/pkg/cli/etcdsnapshot/etcd_snapshot.go:104 +0x45 github.com/urfave/cli.HandleAction({0x4ec5ea0?, 0x5e149a0?}, 0x4?) /go/pkg/mod/github.com/urfave/cli@v1.22.14/app.go:524 +0x50 github.com/urfave/cli.Command.Run({{0x597fc1c, 0x4}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x5a2f781, 0x22}, {0x0, ...}, ...}, ...) /go/pkg/mod/github.com/urfave/cli@v1.22.14/command.go:175 +0x67b github.com/urfave/cli.(App).RunAsSubcommand(0xc0006ae540, 0xc00088f080) /go/pkg/mod/github.com/urfave/cli@v1.22.14/app.go:405 +0xe87 github.com/urfave/cli.Command.startApp({{0x59a2a52, 0xd}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...) /go/pkg/mod/github.com/urfave/cli@v1.22.14/command.go:380 +0xb7f github.com/urfave/cli.Command.Run({{0x59a2a52, 0xd}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...) /go/pkg/mod/github.com/urfave/cli@v1.22.14/command.go:103 +0x845 github.com/urfave/cli.(App).Run(0xc0006ae380, {0xc00088ef20, 0xd, 0x16}) /go/pkg/mod/github.com/urfave/cli@v1.22.14/app.go:277 +0xb87 main.main() /go/src/github.com/k3s-io/k3s/cmd/server/main.go:81 +0xc1e


**Validation Results:**
- k3s version used for validation:
<!-- Provide the result of k3s -v -->

k3s version v1.28.4-rc1+k3s1 (3f237230) go version go1.20.11


<!-- Provide all the observations -->
- Invalid s3 prop (accesskey/bucket-name)

... {"level":"info","ts":"2023-11-22T22:05:47.709325Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-1700690748"} INFO[0000] Checking if S3 bucket exists WARN[0000] Unable to initialize S3 client: Access Denied. INFO[0000] Reconciling ETCDSnapshotFile resources INFO[0000] Checking if S3 bucket exists WARN[0000] Unable to initialize S3 client: Access Denied. INFO[0000] Reconciliation of ETCDSnapshotFile resources complete FATA[0000] Access Denied.

$ kubectl get etcdsnapshotfile | grep s3-on-demand-server1-1700690748 s3-on-demand-server1-1700690748-41242b on-demand-server1-1700690748 s3 0 2023-11-22T22:05:48Z

$ kubectl get etcdsnapshotfile s3-on-demand-server1-1700690748-41242b -o yaml apiVersion: k3s.cattle.io/v1 kind: ETCDSnapshotFile metadata: creationTimestamp: "2023-11-22T22:05:47Z" finalizers: