k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.65k stars 2.32k forks source link

k3s daemon fails to launch after reboot #5669

Closed graytonio closed 2 years ago

graytonio commented 2 years ago

Environmental Info: K3s Version:

k3s version v1.22.3+k3s1 (61a2aab2)
go version go1.16.8

Node(s) CPU architecture, OS, and Version: Linux kube-master 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 Server, 3 Agents, All running in vms on proxmox cluster

Describe the bug: Following a reboot of the k3s server the daemon fails to start. Pulling a previous backup of the vm did not fix the issue.

Steps To Reproduce:

Expected behavior: Server runs correctly and allows agents and kubectl to connect

Actual behavior: Server fails to start causing any agent or kubectl to fail to connect

Additional context / logs:

-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 79289.
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Starting k3s v1.22.3+k3s1 (61a2aab2)"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Cluster bootstrap already complete"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Database tables and indexes are up to date"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Kine available at unix://kine.sock"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-servers=unix://kine.sock --feature-gates=JobTrackingWithFinalizers=true --ins>
Jun 12 05:24:27 kube-master k3s[112088]: Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.600985  112088 server.go:581] external host was not specified, using 10.0.0.25
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.601416  112088 server.go:175] Version: v1.22.3+k3s1
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --profiling=false --secure-port=10259"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Waiting for API server to become available"
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.606263  112088 shared_informer.go:240] Waiting for caches to sync for node_authorizer
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/k3s/server/tls/client-c>
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-provider=k3s --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --leader-elect=false --node-status-update-freq>
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="To join node to cluster: k3s agent -s https://10.0.0.25:6443 -t ${NODE_TOKEN}"
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.609027  112088 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.609063  112088 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Run: k3s kubectl"
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.621355  112088 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.621394  112088 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jun 12 05:24:27 kube-master k3s[112088]: W0612 05:24:27.637377  112088 genericapiserver.go:455] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.639167  112088 instance.go:278] Using reconciler: lease
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=error msg="runtime core not ready"
Jun 12 05:24:27 kube-master k3s[112088]: I0612 05:24:27.696709  112088 rest.go:130] the default service ipfamily for this cluster is: IPv4
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="certificate CN=kube-master signed by CN=k3s-server-ca@1645499820: notBefore=2022-02-22 03:17:00 +0000 UTC notAfter=2023-06-12 05:24:27 +0000 UTC"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="certificate CN=system:node:kube-master,O=system:nodes signed by CN=k3s-client-ca@1645499820: notBefore=2022-02-22 03:17:00 +0000 UTC notAfter=2023-06-12 05:24:27 +0000 UTC"
Jun 12 05:24:27 kube-master systemd[1]: var-lib-rancher-k3s-agent-containerd-multiple\x2dlowerdir\x2dcheck613509344-merged.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit var-lib-rancher-k3s-agent-containerd-multiple\x2dlowerdir\x2dcheck613509344-merged.mount has successfully entered the 'dead' state.
Jun 12 05:24:27 kube-master systemd[4288]: var-lib-rancher-k3s-agent-containerd-multiple\x2dlowerdir\x2dcheck613509344-merged.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit UNIT has successfully entered the 'dead' state.
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Module overlay was already loaded"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Module nf_conntrack was already loaded"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Module br_netfilter was already loaded"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Module iptable_nat was already loaded"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Jun 12 05:24:27 kube-master k3s[112088]: time="2022-06-12T05:24:27Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.194373  112088 genericapiserver.go:455] Skipping API authentication.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.197727  112088 genericapiserver.go:455] Skipping API authorization.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.219674  112088 genericapiserver.go:455] Skipping API certificates.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.222534  112088 genericapiserver.go:455] Skipping API coordination.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.233120  112088 genericapiserver.go:455] Skipping API networking.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.238634  112088 genericapiserver.go:455] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.250778  112088 genericapiserver.go:455] Skipping API rbac.authorization.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.250822  112088 genericapiserver.go:455] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.253616  112088 genericapiserver.go:455] Skipping API scheduling.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.253660  112088 genericapiserver.go:455] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.261311  112088 genericapiserver.go:455] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.265366  112088 genericapiserver.go:455] Skipping API flowcontrol.apiserver.k8s.io/v1alpha1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.276518  112088 genericapiserver.go:455] Skipping API apps/v1beta2 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.276562  112088 genericapiserver.go:455] Skipping API apps/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.280271  112088 genericapiserver.go:455] Skipping API admissionregistration.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: I0612 05:24:28.287521  112088 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jun 12 05:24:28 kube-master k3s[112088]: I0612 05:24:28.287566  112088 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jun 12 05:24:28 kube-master k3s[112088]: W0612 05:24:28.296533  112088 genericapiserver.go:455] Skipping API apiregistration.k8s.io/v1beta1 because it has no resources.
Jun 12 05:24:28 kube-master k3s[112088]: containerd: exit status 2
Jun 12 05:24:28 kube-master systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit k3s.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Jun 12 05:24:28 kube-master systemd[1]: k3s.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit k3s.service has entered the 'failed' state with result 'exit-code'.
Jun 12 05:24:28 kube-master systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: A start job for unit k3s.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit k3s.service has finished with a failure.
--
-- The job identifier is 79289 and the job result is failed.

Systemd Unit File

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=notify
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s server --kubelet-arg=allowed-unsafe-sysctls=net.ipv4.conf.all.src_valid_mark --data-dir /var/lib/rancher/k3s --disable servicelb --disable local-storage
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
brandond commented 2 years ago

The logs say that containerd is crashing. Can you check the containerd log to see why?

graytonio commented 2 years ago

Meant to put those in the original post.

It's repeating this over and over again

time="2022-06-12T13:38:45Z" level=warning msg="deprecated version : `1`, please switch to version `2`"
time="2022-06-12T13:38:45.583392464Z" level=info msg="starting containerd" revision= version=v1.5.7-k3s2
time="2022-06-12T13:38:45.610618941Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2022-06-12T13:38:45.610679466Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2022-06-12T13:38:45.610724352Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2022-06-12T13:38:45.610923331Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.fuse-overlayfs\"..." type=io.containerd.snapshotter.v1
time="2022-06-12T13:38:45.611000584Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.stargz\"..." type=io.containerd.snapshotter.v1
time="2022-06-12T13:38:45.611552709Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2022-06-12T13:38:45.611602259Z" level=info msg="metadata content store policy set" policy=shared
time="2022-06-12T13:38:45.611719805Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2022-06-12T13:38:45.611765560Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2022-06-12T13:38:45.611823151Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.611870832Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.611903385Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.611924504Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.611946446Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.611980220Z" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.612017159Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.612040938Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.612066668Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2022-06-12T13:38:45.612133428Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2022-06-12T13:38:45.612237162Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2022-06-12T13:38:45.612737206Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2022-06-12T13:38:45.612781973Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2022-06-12T13:38:45.612859482Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.612893316Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.612914714Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.612934477Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.612965509Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613006648Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613039457Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613072223Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613095030Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2022-06-12T13:38:45.613160748Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613183995Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613203670Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2022-06-12T13:38:45.613222379Z" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
panic: invalid page type: 195: 10

goroutine 212 [running]:
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*Cursor).search(0xc0011c5300, 0x7f77c98, 0x9, 0x9, 0xc3)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/cursor.go:250 +0x345
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*Cursor).searchPage(0xc0011c5300, 0x7f77c98, 0x9, 0x9, 0x7f6c68785000)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/cursor.go:308 +0x16c
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*Cursor).search(0xc0011c5300, 0x7f77c98, 0x9, 0x9, 0x67)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/cursor.go:265 +0x194
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*Cursor).seek(0xc0011c5300, 0x7f77c98, 0x9, 0x9, 0x7f6c68782070, 0x9, 0x9, 0x7f6c68782079, 0xf, 0xf, ...)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/cursor.go:159 +0x7d
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*Bucket).Get(0xc0011c64c0, 0x7f77c98, 0x9, 0x9, 0x0, 0x0, 0xf)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/bucket.go:262 +0xbb
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/boltutil.ReadTimestamps(0xc0011c64c0, 0xc0011c5520, 0xc0011c5538, 0x6, 0xc000fc4870)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/boltutil/helpers.go:119 +0x107
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata.readContainer(0xc0011c54b8, 0xc0011c64c0, 0x40, 0xc000ae0e80)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/containers.go:320 +0xad
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata.(*containerStore).List.func1.1(0x7f6c68855320, 0x40, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/containers.go:101 +0x11a
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*Bucket).ForEach(0xc0011c6240, 0xc0011c56e0, 0x3, 0x3)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/bucket.go:390 +0x103
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata.(*containerStore).List.func1(0xc000178380, 0x47cc800, 0x8124cb8)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/containers.go:94 +0x1ba
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata.view(0x58f5bc0, 0xc000fc42d0, 0x58a6320, 0xc0002423f0, 0xc000fc4300, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/bolt.go:48 +0x73
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata.(*containerStore).List(0xc0011ba670, 0x58f5bc0, 0xc000fc42d0, 0xc0011c8060, 0x1, 0x1, 0xc00168e868, 0x4a94c6, 0x4b84500, 0xc000fc42d0, ...)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/containers.go:88 +0x174
github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers.(*local).ListStream.func1(0x58f5bc0, 0xc000fc42d0, 0x47cc800, 0x8124cb8)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers/local.go:102 +0x71
github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers.(*local).withStore.func1(0xc000178380, 0xc00168e900, 0xc000178380)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers/local.go:198 +0x8b
github.com/rancher/k3s/vendor/go.etcd.io/bbolt.(*DB).View(0xc0002f2fc0, 0xc0011ca060, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/go.etcd.io/bbolt/db.go:772 +0x93
github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata.(*DB).View(...)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/metadata/db.go:238
github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers.(*local).withStoreView(0xc001193c20, 0x58f5bc0, 0xc000fc4270, 0xc0011ca040, 0x4b0d3c0, 0x1)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers/local.go:203 +0x68
github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers.(*local).ListStream(0xc001193c20, 0x58f5bc0, 0xc000fc4270, 0xc0011c6140, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/services/containers/local.go:101 +0xc5
github.com/rancher/k3s/vendor/github.com/containerd/containerd.(*remoteContainers).stream(0xc0017fa3e0, 0x58f5bc0, 0xc000fc4270, 0xc0011c8060, 0x1, 0x1, 0x20300000000000, 0x7f6c69534fff, 0x7f6c68b9c640, 0x7f6c68b9c640, ...)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/containerstore.go:80 +0xc3
github.com/rancher/k3s/vendor/github.com/containerd/containerd.(*remoteContainers).List(0xc0017fa3e0, 0x58f5bc0, 0xc000fc4270, 0xc0011c8060, 0x1, 0x1, 0x203000, 0x5, 0xc00168ecb0, 0x419d77, ...)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/containerstore.go:57 +0x70
github.com/rancher/k3s/vendor/github.com/containerd/containerd.(*Client).Containers(0xc0007467e0, 0x58f5bc0, 0xc000fc4270, 0xc0011c8060, 0x1, 0x1, 0x25, 0x58f5bc0, 0xc000fc40f0, 0x47cd2e0, ...)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/client.go:257 +0x9a
github.com/rancher/k3s/vendor/github.com/containerd/containerd/runtime/restart/monitor.(*monitor).monitor(0xc0011ba6d0, 0x58f5bc0, 0xc000fc4270, 0x6, 0x58f5bc0, 0xc000fc4270, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/runtime/restart/monitor/monitor.go:199 +0x128
github.com/rancher/k3s/vendor/github.com/containerd/containerd/runtime/restart/monitor.(*monitor).reconcile.func1(0xc0013e6060, 0x58f5b50, 0xc000074040, 0xc0013e6030, 0x6, 0xc0011ba6d0)
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/runtime/restart/monitor/monitor.go:175 +0xcd
created by github.com/rancher/k3s/vendor/github.com/containerd/containerd/runtime/restart/monitor.(*monitor).reconcile
        /go/src/github.com/rancher/k3s/vendor/github.com/containerd/containerd/runtime/restart/monitor/monitor.go:172 +0x165
graytonio commented 2 years ago

Found the resolution to the issue in https://github.com/containerd/containerd/issues/3347

The restart corrupted the meta.db file under /var/lib/rancher/k3s/agent/containerd/io/containerd.metadata.v1.bolt/meta.db.

Deleting the file and restarting the service caused everything to come up correctly.

Probably should have created this issue in cotainerd but thank you for the help.