Open e3b0c442 opened 1 week ago
I was finally able to capture the podspec from the backup mover pod:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-06-28T01:58:54Z"
labels:
velero.io/data-upload: one-backup-8cv5q
velero.io/exposer-pod-group: snapshot-exposer
name: one-backup-8cv5q
namespace: velero
ownerReferences:
- apiVersion: velero.io/v2alpha1
controller: true
kind: DataUpload
name: one-backup-8cv5q
uid: a9f1adae-b258-4fd6-bb0c-63a44a5b4105
resourceVersion: "2931830"
uid: 8c834182-aaaf-4c99-9ee1-6a1d8afe2899
spec:
containers:
- command:
- /velero-helper
- pause
image: velero/velero:v1.14.0
imagePullPolicy: Never
name: a9f1adae-b258-4fd6-bb0c-63a44a5b4105
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeDevices:
- devicePath: /a9f1adae-b258-4fd6-bb0c-63a44a5b4105
name: a9f1adae-b258-4fd6-bb0c-63a44a5b4105
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-jr452
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: go
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: velero-server
serviceAccountName: velero-server
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
topologySpreadConstraints:
- labelSelector:
matchLabels:
velero.io/exposer-pod-group: snapshot-exposer
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumes:
- name: a9f1adae-b258-4fd6-bb0c-63a44a5b4105
persistentVolumeClaim:
claimName: one-backup-8cv5q
- name: kube-api-access-jr452
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-06-28T01:58:55Z"
status: "False"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2024-06-28T01:58:55Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-06-28T01:58:55Z"
message: 'containers with unready status: [a9f1adae-b258-4fd6-bb0c-63a44a5b4105]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-06-28T01:58:55Z"
message: 'containers with unready status: [a9f1adae-b258-4fd6-bb0c-63a44a5b4105]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-06-28T01:58:55Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: velero/velero:v1.14.0
imageID: ""
lastState: {}
name: a9f1adae-b258-4fd6-bb0c-63a44a5b4105
ready: false
restartCount: 0
started: false
state:
waiting:
reason: ContainerCreating
hostIP: 192.168.227.5
hostIPs:
- ip: 192.168.227.5
phase: Pending
qosClass: BestEffort
startTime: "2024-06-28T01:58:55Z"
Please check the directory kubelet
save the pod data.
The node-agent tries to read that directory at /var/lib/kubelet
.
Your environment may not use the same directory. Please check and update the node-agent setting accordingly.
volumes:
- hostPath:
path: /var/lib/kubelet/pods
type: ""
name: host-pods
I created a debug pod with a mount for /var/lib/kubelet/pods matching node-agent pod spec and was able to interact with no issue, so I do not believe this is an OS issue.
I also tried setting the node-agent
container securityContext
to privileged: true
with no improvement.
I tried rolling back to Velero 1.13 as well, no improvement.
Verifying the correct directory:
➜ ~ talosctl ls -d3 -n go -Hl /var/lib/kubelet
NODE MODE UID GID SIZE(B) LASTMOD NAME
go drwx------ 0 0 215 B 1 day ago kubelet
go -rw------- 0 0 62 B 4 days ago kubelet/cpu_manager_state
go drwxr-xr-x 0 0 173 B 1 day ago kubelet/device-plugins
go Srwxr-xr-x 0 0 0 B 1 day ago kubelet/device-plugins/gpu.intel.com-i915.sock
go Srwxr-xr-x 0 0 0 B 1 day ago kubelet/device-plugins/kubelet.sock
go -rw------- 0 0 34 kB 1 day ago kubelet/device-plugins/kubelet_internal_checkpoint
go Srwxr-xr-x 0 0 0 B 1 day ago kubelet/device-plugins/kubevirt-kvm.sock
go Srwxr-xr-x 0 0 0 B 1 day ago kubelet/device-plugins/kubevirt-tun.sock
go Srwxr-xr-x 0 0 0 B 1 day ago kubelet/device-plugins/kubevirt-vhost-net.sock
go -rw-r--r-- 0 0 89 B 1 day ago kubelet/graceful_node_shutdown_state
go -rw------- 0 0 61 B 4 days ago kubelet/memory_manager_state
go drwxr-xr-x 0 0 124 B 1 day ago kubelet/pki
go -rw------- 0 0 842 B 4 days ago kubelet/pki/kubelet-client-2024-06-23-20-17-53.pem
go Lrwxrwxrwx 0 0 59 B 4 days ago kubelet/pki/kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2024-06-23-20-17-53.pem
go -rw-r--r-- 0 0 2.2 kB 1 day ago kubelet/pki/kubelet.crt
go -rw------- 0 0 1.7 kB 1 day ago kubelet/pki/kubelet.key
go drwxr-x--- 0 0 98 B 4 days ago kubelet/plugins
go drwxr-x--- 0 0 17 B 4 days ago kubelet/plugins/kubernetes.io
go drwxr-xr-x 0 0 39 B 1 day ago kubelet/plugins/rook-ceph.cephfs.csi.ceph.com
go drwxr-xr-x 0 0 22 B 1 day ago kubelet/plugins/rook-ceph.rbd.csi.ceph.com
go drwxr-x--- 0 0 95 B 1 day ago kubelet/plugins_registry
go Srwx------ 0 0 0 B 1 day ago kubelet/plugins_registry/rook-ceph.cephfs.csi.ceph.com-reg.sock
go Srwx------ 0 0 0 B 1 day ago kubelet/plugins_registry/rook-ceph.rbd.csi.ceph.com-reg.sock
go drwxr-x--- 0 0 26 B 1 day ago kubelet/pod-resources
go Srwxr-xr-x 0 0 0 B 1 day ago kubelet/pod-resources/kubelet.sock
go drwxr-x--- 0 0 8.2 kB 7 minutes ago kubelet/pods
go drwxr-x--- 0 0 94 B 1 day ago kubelet/pods/00fe8484-abc6-4778-a58a-3995a0686214
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/01800aea-6dbc-4d1b-ab75-c9f852627e5d
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/0330bf48-d894-438b-bf50-e56f544fd6c0
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/0943d7f9-dd30-47c7-8a6b-8df1e193a12a
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/1c949679-a7d1-443d-8654-ee09e4df1798
go drwxr-xr-x 0 0 71 B 1 day ago kubelet/pods/1cd491e3-b677-40e8-a121-4e246dea0135
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/21bce9fc-3557-4aea-b6ef-3ebd382a9528
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/2e4fb0b5-b00e-42a6-8bed-3d8574d77e6d
go drwxr-x--- 0 0 92 B 1 day ago kubelet/pods/30b88225-f47e-4309-9ac9-b5d985f582d8
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/36cfc4bf-1b99-4102-8882-94b01b4f8208
go drwxr-x--- 0 0 94 B 1 day ago kubelet/pods/375d7bf5-1b0e-40e2-88db-707cc8932d7c
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/37aa08cb-84ed-4042-acf6-29ddbcef2ed8
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/39233f8b-82eb-4377-a844-997bac9d30f8
go drwxr-x--- 0 0 94 B 1 day ago kubelet/pods/394ab777-f75c-4a75-95a5-cfb0281c2eb8
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/3c853306-2cc8-48a0-80cf-65782e1a189a
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/4e3978bf-3850-4427-bfe3-033704babab8
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/551602cd-db8f-4459-a085-1ac342f44e74
go drwxr-x--- 0 0 94 B 13 hours ago kubelet/pods/585bed97-dbdd-40a3-a525-ee76e50c9325
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/5df5d242-d8ba-4f11-bb02-f5c37843cbb8
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/61189325-b7c0-442f-95e7-2faf43d4317d
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/61a0d23e-5697-4335-b7a3-f4eef9a33e0b
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/63dd0b78-8c36-46b6-ab7d-ddedf2cf9151
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/6702d1ad-be5d-4470-84be-fc62babedae2
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/6c36f47e-06b4-4b48-844e-4fd4de313bed
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/6cc0bfd8-02fc-4156-b929-e47ceba7a76e
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/75245a84-8fbe-4463-8a1b-8573f3903161
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/75cbcc8a-3c7f-4e18-ba74-e417b33a89b4
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/76c25ca7-c55d-4763-ab39-ae779a76d7eb
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/77f243b2-7d26-4a0a-b10a-2bb24443b185
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/788fe973-4d51-4896-a419-e1f6b0f1117d
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/7b6a34b4-f63b-489f-9ccb-4f882035df06
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/7d99e6be-ebc9-4374-9389-5f2c02ecc7de
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/7ff62351-8169-4af0-90f1-be0574a9b631
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/869991e4-e584-4f5e-9ddf-526db1d29747
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/8d3beb6a-d8a5-4a0f-9d19-3228627b7019
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/904de1c0-ef44-4d2b-aad4-7c76dd898829
go drwxr-xr-x 0 0 71 B 1 day ago kubelet/pods/9266b9a2-1189-4ae3-91b9-f11a343af1a4
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/9607c310-e85c-47e5-8ede-ba1328c4b515
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/961e0e8d-5d60-4d8e-97f1-5d003376b87e
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/965cecd5-b749-4a20-a327-f9bd76007eed
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/a6a8f181-7060-42a3-bd05-d30d3592db47
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/a8f49850-fc11-4680-b50a-170da2da07b2
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/afc92b25-6be3-4fbe-b9a2-c87fdaec2220
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/b23280c8-4819-432e-b645-3895a83dc604
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/b6241eca-425f-492e-a388-7092c320e061
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/b7e866c4-0585-4236-9dc5-cfe3c0a9fa5b
go drwxr-x--- 0 0 94 B 1 day ago kubelet/pods/bc8f408b-7a31-4be4-9cc1-5776951148cf
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/bfa93d74-7c1f-4421-afff-7a442e9b26c2
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/c885be9f-003d-4e52-b9f0-0fb72979d0d6
go drwxr-x--- 0 0 71 B 23 hours ago kubelet/pods/c9c7949d-c27c-4eb6-b884-e87edef9d89a
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/cffc298d-36b5-40c1-8d3f-34110bdcf5b1
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/d0c7438a-d5f2-489b-8d95-19e44d776625
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/d3d7cffa-1e3c-4777-b919-fbd58cbb559b
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/d9ba9327-cde8-4573-8b3a-90b338766919
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/e24a3315-5459-49ad-ba79-ee587699fe47
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/e3644ff3-04df-478a-a60b-704f635ba1d2
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/e56ffb9c-4244-4d4e-a52a-f9a6810f93d6
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/e612f8b6-4b03-47e1-adda-257f0d255b89
go drwxr-x--- 0 0 71 B 8 minutes ago kubelet/pods/e67e97eb-7bbe-4475-98b2-d2f6175d1eb7
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/e900eda3-9f43-4e5c-ad28-a4e156b121a9
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/e9ccf61f-25c2-409d-953a-15eca13a8c21
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/f4926bb6-f14f-40ad-9be1-2cdc533f5f00
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/f5e9ad10-b438-4bd3-9e09-3ce97cea619c
go drwxr-x--- 0 0 71 B 1 day ago kubelet/pods/f95e5f63-b239-4fa1-8b5b-dff909adb81b
go drwx------ 0 0 22 B 4 days ago kubelet/seccomp
go drwx------ 0 0 6 B 4 days ago kubelet/seccomp/profiles
Cons:
It backs up data from the live file system, in which way the data is not captured at the same point in time, so is less consistent than the snapshot approaches.
It access the file system from the mounted hostpath directory, so Velero Node Agent pods need to run as root user and even under privileged mode in some environments.
Please check whether the node-agent was run as the root user. This can be done by set the runAsUser
to 0
for the node-agent DaemonSet.
securityContext:
runAsUser: 0
Yes, node-agent
is run as root.
OK, making some headway here.
I'm fairly sure this message originates from a file in the kubevirt PVC that I'm trying to back up which symlinks back to /var/lib/kubelet
:
./volumeDevices/kubernetes.io~csi:
total 0
drwxr-x--- 2 root root 54 Jul 1 20:48 .
drwxr-x--- 3 root root 31 Jul 1 20:48 ..
lrwxrwxrwx 1 root root 142 Jul 1 20:48 pvc-eecd1d33-cefd-42bf-a99b-c45f6cae5759 -> /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-eecd1d33-cefd-42bf-a99b-c45f6cae5759/88e93be8-befb-4a7f-a670-1a87b081aedb
(where .
is /var/lib/kubelet/pods/POD_UUID
on the node the pod is currently running on)
What I am still uncertain of is why this is an issue. I feel like there is something at the intersection of Talos, kubevirt, and velero/kopia that is making this failure unique to this combination. I suspect that if node-agent mounted the host /var/lib/kubelet
at the same path in the container instead of in /host_pods
, that this would not fail, however I'm not familiar enough with Kubernetes machinery to know if this might cause other unintended consequences.
I am going to create an issue in the kubevirt-velero-plugin
repo so that they can take a look as well.
Adding an extra volume mount to mount /var/lib/kubelet
in the pod at the same location as the host allowed the backup to succeed, confirming my hypothesis. This doesn't seem like it should be normal practice though.
Had some time to dive a little deeper here.
This issue can be reproduced reliably with any Block
mode PVC attached to a pod via spec.containers[].volumeDevices
. I am unsure if this is unique to Talos and am working on finding infrastructure to spin up a non-Talos test case.
Confirmed this issue exists in other k8s/Linux distributions and is not unique to Talos (tested on k3s v1.29.6 on Rocky Linux 8).
What steps did you take and what happened:
Attempt a backup with a CSI snapshot data mover. Command:
velero backup create one-backup --include-namespaces vms -l kubevirt.io/vm=one --snapshot-move-data --wait
The backupPartiallyFailed
with the volume data not on the remote backup location:What did you expect to happen: Backup completed successfully
The following information will help us better understand what's going on: bundle-2024-06-27-20-50-21.tar.gz
Anything else you would like to add:
I created a debug pod with a mount for
/var/lib/kubelet/pods
matching node-agent pod spec and was able to interact with no issue, so I do not believe this is an OS issue. Unfortunately the backup mover pod disappears so fast I wasn't able to grab the pod spec.Environment:
velero version
):velero client config get features
):kubectl version
):/etc/os-release
): Talos Linux v1.7.5Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.