k3s version v1.31.4+k3s1 (a562d090)
go version go1.22.9
Node(s) CPU architecture, OS, and Version:
Linux [...] 6.10.2-rt14-arch1-3-rt #1 SMP PREEMPT_RT Sat, 14 Dec 2024 12:07:28 +0000 x86_64 GNU/Linux
Cluster Configuration:
Single node cluster (server+agent) - in my opinion not that relevant within the context of this report.
Describe the bug:
The default --etcd-snapshot-dir value reported by the server and etcd-snapshot K3S CLI command --help dialogues does not match the effective path created and used in runtime.
Take a look at the --help output for both commands:
k3s server --help | grep '\-dir'
--data-dir value, -d value (data) Folder to hold state default /var/lib/rancher/k3s or ${HOME}/.rancher/k3s if not root [$K3S_DATA_DIR]
--etcd-snapshot-dir value (db) Directory to save db snapshots. (default: ${data-dir}/db/snapshots)
# -------- SNIP --------
k3s etcd-snapshot --help | grep '\-dir'
--data-dir value, -d value (data) Folder to hold state default /var/lib/rancher/k3s or ${HOME}/.rancher/k3s if not root [$K3S_DATA_DIR]
--dir value, --etcd-snapshot-dir value (db) Directory to save etcd on-demand snapshot. (default: ${data-dir}/db/snapshots)
Assuming --etcd-snapshot-dir is not provided, the effective path is actually ${data-dir}/server/db/snapshots, instead of ${data-dir}/db/snapshots (note the missing server path segment).
Steps To Reproduce:
To preface this section, I've initially observed this issue on a different system, running NixOS. The K3s service was installed and managed via the k3s nixpkg, obviously adding a layer of abstraction between the K3s distributables and the end user. To rule out the potential configuration skew, I've reproduced this on Arch via AUR, whose install process I understand better.
Add a minimal working example etcd snapshot configuration to the systemd service k3s server invocation. Just enough to enable etcd (instead of SQLite) and get a snapshot created quickly, without flooding the storage:
/usr/bin/k3s server --cluster-init --etcd-snapshot-schedule-cron="* * * * *" --etcd-snapshot-retention=1
See the "additional context" section below for the full systemd unit file.
Ensure service is enabled and reload daemons to get the above configuration running:
sudo system enable --now k3s.service
sudo systemctl daemon-reload
sudo systemctl status k3s.service
● k3s.service - Lightweight Kubernetes
Loaded: loaded (/usr/lib/systemd/system/k3s.service; enabled; preset: disabled)
Active: active (running)
# -------- SNIP --------
Manually trigger an etcd snapshot via k3s etcd-snapshot
sudo k3s etcd-snapshot save
INFO[0000] Snapshot on-demand-machine-1736613048 saved.
Wait a minute, for the next etcd snapshot cron schedule tick.
[!NOTE]
The service is running as root, and therefore the effective default --data-dir directory is /var/lib/rancher/k3s.
Expected behavior:
The snapshots are saved under /var/lib/rancher/k3s/db/snapshots, since this is what k3s server --help told me.
Actual behavior:
The snapshots are saved under /var/lib/rancher/k3s/server/db/snapshots:
[root@machine k3s]# pwd
/var/lib/rancher/k3s
[root@machine k3s]# ls
agent data server # No db dir here
[root@machine k3s]# cd server/
[root@machine server]# ls
agent-token cred db etc kine.sock manifests node-token static tls token
[root@machine server]# cd db/snapshots/
[root@machine snapshots]# ls
etcd-snapshot-machine-1736614203 on-demand-machine-1736613048
Suffixing --data-dir with server, for server-originating artifacts totally makes sense, and imo it's the CLI help that needs to be updated. I'll submit a PR.
Environmental Info: K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
Single node cluster (server+agent) - in my opinion not that relevant within the context of this report.
Describe the bug:
The default
--etcd-snapshot-dir
value reported by theserver
andetcd-snapshot
K3S CLI command--help
dialogues does not match the effective path created and used in runtime.Take a look at the
--help
output for both commands:Assuming
--etcd-snapshot-dir
is not provided, the effective path is actually${data-dir}/server/db/snapshots
, instead of${data-dir}/db/snapshots
(note the missingserver
path segment).Steps To Reproduce:
To preface this section, I've initially observed this issue on a different system, running NixOS. The K3s service was installed and managed via the k3s nixpkg, obviously adding a layer of abstraction between the K3s distributables and the end user. To rule out the potential configuration skew, I've reproduced this on Arch via AUR, whose install process I understand better.
Install
v1.31.4+k3s1
Add a minimal working example etcd snapshot configuration to the systemd service
k3s server
invocation. Just enough to enable etcd (instead of SQLite) and get a snapshot created quickly, without flooding the storage:/usr/bin/k3s server --cluster-init --etcd-snapshot-schedule-cron="* * * * *" --etcd-snapshot-retention=1
Ensure service is enabled and reload daemons to get the above configuration running:
Manually trigger an etcd snapshot via
k3s etcd-snapshot
Wait a minute, for the next etcd snapshot cron schedule tick.
Expected behavior:
The snapshots are saved under
/var/lib/rancher/k3s/db/snapshots
, since this is whatk3s server --help
told me.Actual behavior:
The snapshots are saved under
/var/lib/rancher/k3s/server/db/snapshots
:Additional context / logs:
I've looked in the sources, and the effective data dir appears to be resolved here, on server startup: https://github.com/k3s-io/k3s/blob/a562d090b05cf8d55b6a8b57556787c24c8ce21a/pkg/server/server.go#L466-L486 https://github.com/k3s-io/k3s/blob/a562d090b05cf8d55b6a8b57556787c24c8ce21a/pkg/server/server.go#L40-L42
The above applies to both the
server
andetcd-snapshot
commands, since to my understanding, the snapshots invoked manually byetcd-snapshot save
send aPOST /db/snapshot
to the server, which in turn calculates the snapshot write path, using theDataDir
config value resolved on startup.Full systemd unit file