Closed surekhakallam closed 4 years ago
Hi @surekhakallam!
Based on pod volume backup failed: error getting volume path on host: expected one matching path, got 0
, I'm guessing you're using restic. Is that correct?
Can you tell me more about the environments where your backups work and where they don't? Are there any common things you've identified?
Can you provide the output of velero backup logs <backupname>
for a backup that failed?
yes i am using restic
the above information is the information of the vm where the backup is not working
the below provided information is from the vm where the backup is working
Client: Version: v1.3.2 Git commit: 55a9914a3e4719fb1578529c45430a8c11c28145 Server: Version: v1.3.2
velero client config get features
features:
kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.4", GitCommit:"67d2fcf276fcd9cf743ad4be9a9ef5828adc082f", GitTreeState:"clean", BuildDate:"2019-09-18T14:51:13Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:00:06Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
/etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
both the clusters are in vsphere ... on the same region but on a different esxi
time="2020-04-29T06:14:49Z" level=info msg="Setting up backup temp file" backup=velero/infra-distributed-db-backup logSource="pkg/controller/backup_controller.go:494"
time="2020-04-29T06:14:49Z" level=info msg="Setting up plugin manager" backup=velero/infra-distributed-db-backup logSource="pkg/controller/backup_controller.go:501"
time="2020-04-29T06:14:49Z" level=info msg="Getting backup item actions" backup=velero/infra-distributed-db-backup logSource="pkg/controller/backup_controller.go:505"
time="2020-04-29T06:14:49Z" level=info msg="Setting up backup store" backup=velero/infra-distributed-db-backup logSource="pkg/controller/backup_controller.go:511"
time="2020-04-29T06:14:49Z" level=info msg="Writing backup version file" backup=velero/infra-distributed-db-backup logSource="pkg/backup/backup.go:213"
time="2020-04-29T06:14:49Z" level=info msg="Including namespaces: " backup=velero/infra-distributed-db-backup logSource="pkg/backup/backup.go:219"
time="2020-04-29T06:14:49Z" level=info msg="Excluding namespaces:
@surekhakallam Could you please provide the following:
UID
of the pod, whose backup causes this error.ls -lR /host_pods/<UID_OF_POD_FROM_STEP_1>/volumes/
a recursive list of /host_pods/<UID_OF_POD_FROM_STEP_1>/volumes/
actually the setup on which the velero backup was not working was deleted ... i will provide the details tomorrow by creating one more vm and take backup and try on that ... actually today i did not find time to do as i was engaged on other work
ls -lR /host_pods/b8bdb36a-986a-4388-9a9b-dbc2680bbbf9/volumes/ ls: cannot access /host_pods/b8bdb36a-986a-4388-9a9b-dbc2680aaaf9/volumes/: No such file or directory
got this as output for the step 2
@surekhakallam So the problem is likely caused because the restic daemon set is unable to mount the pod's volume for it to perform backup. Because you are running kubernetes 1.17.3, I am assuming that mount propagation is enabled.
What you might be missing is that the restic daemon set pods don't have access to the pod's volumes on the host. This is because the restic pod may not be running in privileged mode.
Can you try following this https://github.com/vmware-tanzu/velero/issues/1638#issuecomment-510703864 to update your restic daemonset and trying this again?
hi @ashish-amarnath , i have done the changes as you had told .. but still getting the same error
pod-volume-backup error="expected one matching path, got 0"
@surekhakallam I made a mistake in one of my earlier suggestions. Can you please inspect this:
If /var/lib/kubelet/pods
does not exist please check to see if /var/vcap/data/kubelet/pods
exists instead.
Exec into the restic pod on that vm, identified in step 2 and inspect if /host_pods/<POD-UID>/*
exists.
Paste the yaml for the restic daemon set.
If you run into a No such file or directory
when running ls
, please investigate what part of the path, that we are trying to list, exists.
Also please compare this between VMs where the backups are working and the VMs where you are seeing this failure
You may also want to check-out our instructions from https://velero.io/docs/v1.4-pre/restic/#instructions
the above is the yaml file of the restic... since i had other task which i was concentrating on i could not go through the step which was mentioned by you above thoroughly ... i will do that tomorrow and let you know the update on that .. meanwhile could you please tell me if there is anything to be taken care of or changed in the above yaml file
Thanks in advance
the diff i found is the working vm is kubernetes version1.15 and and non working vm has kubernetes version1.17
@surekhakallam I am curious to know if /var/lib/kubelet
is the kubelet root dir on each of the clusters. Is it possible for you to SSH into the nodes (particularly on the non-working cluster) and see if this directory exists? @ashish-amarnath added some more detailed instructions at https://github.com/vmware-tanzu/velero/issues/2506#issuecomment-628150069
yes this folder /var/lib/kubelet is present in it and also /var/lib/kubelet/pods path is also present on all the nodes as well as instructed by @ashish-amarnath i have followed all steps and all the directories where present as expected ... but the only difference between the working cluster and the non-working cluster is that in the working cluster it was using kubernetes 1.15 version and in the non-working cluster it was using kubernetes 1.17 ... do we need to add any feature gateway or anything as the pre-request to the cluster in case it is taking kubernetes 1.17 version
I'm not aware of any feature flags or any other config issues that would arise with 1.17.
So the next thing I would try to look at is: for one of the volumes that's failing, compare what you see if you SSH into the node it's running on and go to /var/lib/kubelet/pods/<pod-uid>/volumes
to what you see via the velero hostPath mount, in the restic daemonset pod running on that same node, under /host_pods/<pod-uid>/volumes
. These should be different views of the same directory structure, but it's possible something weird is going on with mount propagation or something else.
hi ...
i have checked both the folders from the node as well as the restic daemonset pod running on the same node ...and found that the data what ever is there on the node in the /var/lib/kubelet/pods/
Can you give me the full volume directory for the volume that's failing to back up? It should be something like: /host_pods/<pod-UID>/volumes/<storage-plugin-name>/<volume-name>
Can you also attach the full YAML for the pod that's got the volume that's failing to back up, as well as the YAML for the PVC (if you are in fact using a PVC here)?
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
host_pods/
apiVersion: v1
kind: Pod
metadata:
annotations:
backup.velero.io/backup-volumes: ignite-work,ignite-hdd
cni.projectcalico.org/podIP: 10.200.36.76/32
cni.projectcalico.org/podIPs: 10.200.36.76/32
creationTimestamp: "2020-05-20T16:05:43Z"
generateName: infra-distributed-db-
labels:
app: infra-distributed-db
controller-revision-hash: infra-distributed-db-5sdfsc799c
statefulset.kubernetes.io/pod-name: infra-distributed-db-0
name: infra-distributed-db-0
namespace:
apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: infra-distributed-db
uid:
matchExpressions:
key: distributed_db
operator: In
values:
deploy
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
labelSelector:
matchExpressions:
key: app
operator: In
values:
infra-distributed-db
topologyKey: kubernetes.io/hostname
containers:
env:
name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
name: LOGSTASH_HOST
value: logstash:5044
image:
timeoutSeconds: 1
name: infra-distributed-db
ports:
containerPort:
image: busybox
imagePullPolicy: Always
name: disable-thp
resources: {}
terminationMessagePath: /folder1/termination-log
terminationMessagePolicy: File
volumeMounts:
mountPath: /host-sys
name: host-sys
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name:
how do we fix this .. can you please guide us
@surekhakallam I can't really tell from the info you've provided what the problem might be and without a more interactive session, I'm not sure if we'll be able to get to the bottom of this.
If you're interested, VMware does offer professional support around Velero and our other open-source projects. Since it sounds like you're already a VMware customer, this might be a route to look into. Let me know if you'd like me to put you in touch with someone.
sure will talk to my team regarding this ... sorry for late reply as i was looking at different issues i couldnt reply ... thanku so much for giving your this opportunity
I am closing this issue for inactivity. Please feel free to reach out if you need further assistance.
hi... we are able to take backup and restore on few setups but we are unable to do on some other setups..... We use the same OS for all our clusters and we are using vsphere ... Still we are able to take backup and restore them backup on few of our clusters but unable to take on few.. is there anything that will stop or block for taking backup's can you please tell us ... The one error which we found is as follows....
pod volume backup failed: error getting volume path on host: expected one matching path, got 0
velero version Client: Version: v1.3.1 Git commit: 0665b05321eefeb7b7fdd6984750745b7429774f Server: Version: v1.3.1
velero client config get features features:
kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:35:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
/etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"