Closed pseymournutanix closed 10 months ago
@pseymournutanix fyi your bundle hasn't uploaded.
@rnarenpujari
I put the feedback of the Slack communication here to let other contributors also give some feedback.
time="2023-11-22T19:34:55Z" level=info msg="1 errors encountered backup up item" backup=velero/manual-bk-test logSource="pkg/backup/backup.go:444" name=plants-xq-qa-consul-consul-server-0
time="2023-11-22T19:34:55Z" level=error msg="Error backing up item" backup=velero/manual-bk-test error="node name is empty" error.file="/go/src/[github.com/vmware-tanzu/velero/pkg/nodeagent/node_agent.go:57](https://github.com/vmware-tanzu/velero/pkg/nodeagent/node_agent.go:57)" error.function=[github.com/vmware-tanzu/velero/pkg/nodeagent.IsRunningInNode](https://github.com/vmware-tanzu/velero/pkg/nodeagent.IsRunningInNode) logSource="pkg/backup/backup.go:448" name=plants-xq-qa-consul-consul-server-0
In some cases, the backup is marked as PartiallyFailed because the pods, that need backing up volume data by the filesystem uploader, don't have a node assigned yet. The error is generated here. https://github.com/vmware-tanzu/velero/blob/7320bb76744bc7052d839644fcbe34eb746a0f20/pkg/podvolume/backupper.go#L173
The discussion is whether the Velero server needs to fail the backup here. There is a less strict check following. The check is, when the pod is not in running state, returning without error. https://github.com/vmware-tanzu/velero/blob/7320bb76744bc7052d839644fcbe34eb746a0f20/pkg/podvolume/backupper.go#L223
I think the Velero server can log some information without returning an error when the pod doesn't have a node name too.
@reasonerjt Please take a look.
name: /canaveral-analytics-prometheus-565f54bbf8-8zwmj error: /pod volume backup failed: data path backup failed: Failed to run kopia backup: Unable to read dir in path /host_pods/f3f91a0a-4f7a-499f-918e-7681825e33e6/volumes/kubernetes.io~csi/pvc-dc7341b6-d500-4b41-a7d0-b80870d9dea8/mount: open /host_pods/f3f91a0a-4f7a-499f-918e-7681825e33e6/volumes/kubernetes.io~csi/pvc-dc7341b6-d500-4b41-a7d0-b80870d9dea8/mount: input/output error
name: /canaveral-analytics-pushgateway-8c884d5c-tkfv2 error: /pod volume backup failed: data path backup failed: Failed to run kopia backup: Unable to read dir in path /host_pods/69d66904-5573-4c0b-a79f-e7b2964a5cf8/volumes/kubernetes.io~csi/pvc-f4d5f669-1764-496c-bf01-0e655fc850b9/mount: open /host_pods/69d66904-5573-4c0b-a79f-e7b2964a5cf8/volumes/kubernetes.io~csi/pvc-f4d5f669-1764-496c-bf01-0e655fc850b9/mount: input/output error
name: /canaveral-config-store-867d947fbc-hb2vj error: /node name is empty
name: /canaveral-onboarding-new-77fb56c49f-5jtn7 error: /node name is empty
The backup has four errors. Two of them are related to node name empty issue that is discussed above. The rest errors are failing to read mount directory with input/output error. May I ask the provider of your k8s environment? Looks like it's a hardware issue. https://unix.stackexchange.com/questions/39905/input-output-error-when-accessing-a-directory
Thank you. Yes I fixed the volume errors (it's Nutanix BTW) for those I would expect a failure condition :)
@blackpiglet
I think the Velero server can log some information without returning an error when the pod doesn't have a node name too.
I see the log already tips /node name is empty
What you means 'log some information about'?
Per discussion with @Lyndon-Li We move the chunk of https://github.com/vmware-tanzu/velero/blob/fea22bbbc9faf5a28c2570313df715ffb5721d11/pkg/podvolume/backupper.go#L223
Before checking the status of nodeagent to avoid passing an empty nodename to that function.
What steps did you take and what happened: Pods in Pending state (as someone has put bad taints on them) produce errors in backups.
What did you expect to happen: Now running pods should be ignored.
[Uploading bundle-2023-11-21-13-57-28.tar.gz…]()