Open lizhifengones opened 1 year ago
How to continue troubleshooting?
After configuring according to https://github.com/openebs/velero-plugin#setting-targetip-in-replica the service is running
This should be a bug? autoSetTargetIP: "true" does not take effect
I have set up all 3 pools
# kubectl get svc -n openebs pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 -ojsonpath='{.spec.clusterIP}'
10.43.48.33
# kubectl exec cstor-disk-pool-4zzx-64db8bcdc4-7b9l2 -n openebs -c cstor-pool -it bash
root@cstor-disk-pool-4zzx-64db8bcdc4-7b9l2:/# zfs get io.openebs:targetip
NAME PROPERTY VALUE SOURCE
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11 io.openebs:targetip - -
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-396b5958-1f96-4d73-a44f-aebb86f1fcf1 io.openebs:targetip 10.43.135.194 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-601cf7af-8945-4a4a-b68a-afcedfff1011 io.openebs:targetip 10.43.168.62 local
io.openebs:target ip - -
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 io.openebs:targetip default
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-e24f6f79-91be-45a0-826d-c737616ec4f2 io.openebs:targetip 10.43.134.198 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-f67805ed-313a-4bbe-91d4-b95399d514c6 io.openebs:targetip 10.43.148.10 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-f67805ed-313a-4bbe-91d4-b95399d514c6@snapshot-8729a537-0e4f-40b0-96c0-5320f77a668b io.open ebs:targetip - -
root@cstor-disk-pool-4zzx-64db8bcdc4-7b9l2:/# zfs set io.openebs:targetip=10.43.48.33 cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-b0e72129-dcc5-48ac-b89 1-b052bd38ad74
root@cstor-disk-pool-4zzx-64db8bcdc4-7b9l2:/# zfs get io.openebs:targetip
NAME PROPERTY VALUE SOURCE
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11 io.openebs:targetip - -
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-396b5958-1f96-4d73-a44f-aebb86f1fcf1 io.openebs:targetip 10.43.135.194 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-601cf7af-8945-4a4a-b68a-afcedfff1011 io.openebs:targetip 10.43.168.62 local
io.openebs:target ip - -
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 io.openebs:targetip 10.43.48.33 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74@rebuild_snap io.openebs:targetip - -
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74_rebuild_clone io.openebs:targetip default
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-e24f6f79-91be-45a0-826d-c737616ec4f2 io.openebs:targetip 10.43.134.198 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-f67805ed-313a-4bbe-91d4-b95399d514c6 io.openebs:targetip 10.43.148.10 local
cstor-88ce2e6e-326a-460f-bb9e-89d5d2fd0b11/pvc-f67805ed-313a-4bbe-91d4-b95399d514c6@snapshot-8729a537-0e4f-40b0-96c0-5320f77a668b io.open ebs:targetip-
but the data is lost (I made sure the data was created...)
# kubectl get po -n nginx-example
NAME READY STATUS RESTARTS AGE
nginx-deployment-79bcd4b657-lgrzp 2/2 Running 0 42m
# kubectl exec -n nginx-example $(kubectl get po -n nginx-example -l app=nginx -o name |head -n 1) -it -- cat /var/log/nginx/access.log
Defaulting container name to nginx.
Use 'kubectl describe pod/nginx-deployment-79bcd4b657-lgrzp -n nginx-example' to see all of the containers in this pod.
# kubectl exec -n nginx-example $(kubectl get po -n nginx-example -l app=nginx -o name |head -n 1) -it -- cat /var/log/nginx/error.log
Defaulting container name to nginx.
Use 'kubectl describe pod/nginx-deployment-79bcd4b657-lgrzp -n nginx-example' to see all of the containers in this pod.
I have tried many methods but cannot recover the data of the cstor volume.
scene 1) restore after remove data
"Restored 1 items out of an estimated total of 10 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:669" name=pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 namespace= progress= resource=persistentvolumes restore=velero/defaultbackup3-20230921110342
time="2023-09-21T11:03:43Z" level=info msg="Getting client for /v1, Kind=PersistentVolumeClaim" logSource="pkg/restore/restore.go:918" restore=velero/defaultbackup3-20230921110342
time="2023-09-21T11:03:43Z" level=info msg="restore status includes excludes: <nil>" logSource="pkg/restore/restore.go:1189" restore=velero/defaultbackup3-20230921110342
time="2023-09-21T11:03:43Z" level=info msg="Executing item action for persistentvolumeclaims" logSource="pkg/restore/restore.go:1196" restore=velero/defaultbackup3-20230921110342
time="2023-09-21T11:03:43Z" level=info msg="Executing AddPVFromPVCAction" cmd=/velero logSource="pkg/restore/add_pv_from_pvc_action.go:44" pluginName=velero restore=velero/defaultbackup3-20230921110342
time="2023-09-21T11:03:43Z" level=info msg="Adding PV pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 as an additional item to restore" cmd=/velero logSource="pkg/restore/add_pv_from_pvc_action.go:66" pluginName=velero restore=velero/defaultbackup3-20230921110342
time="2023-09-21T11:03:43Z" level=info msg="Skipping persistentvolumes/pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 because it's already been restored." logSource="pkg/restore/restore.go:1028" restore=velero/defaultbackup3-20230921110342
scene 2) restore after delete ns
time="2023-09-21T11:20:13Z" level=error msg="Cluster resource restore error: error executing PVAction for persistentvolumes/pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74: rpc error: code = Unknown desc = Failed to read PVC for volumeID=pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 snap=defaultbackup3: PVC{nginx-example/nginx-logs} is not bounded!" logSource="pkg/controller/restore_controller.go:494" restore=velero/defaultbackup3-20230921111152
I'll try to check the code to see if it's a problem. If possible, could you please tell me if it's my configuration problem?
There is a very conspicuous error in the log:
time="2023-09-21T11:11:52Z" level=info msg="Creating PVC for volumeID:pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74 snapshot:defaultbackup3 in namespace=nginx-example" cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/cstor/pvc_operation.go:131" pluginName=velero-blockstore-openebs restore=velero/defaultbackup3-20230921111152
time="2023-09-21T11:20:13Z" level=error msg="CreatePVC returned error=PVC{nginx-example/nginx-logs} is not bounded!" cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/cstor/pv_operation.go:205" pluginName=velero-blockstore-openebs restore=velero/defaultbackup3-20230921111152
t
When deleting ns
or restoring the cluster from scratch
,
https://github.com/openebs/velero-plugin/blob/cea57783e3ed887d2b7b0e7bafc436ff26bd9a7b/pkg/cstor/pvc_operation.go#L171
since resource recovery is serial, restore pvc first. Since no instance triggers the creation of pv, there will always be a 500s timeout here.
What steps did you take and what happened: [A clear and concise description of what the bug is, and what commands you ran.]
STATUS=
PartiallyFailed
What did you expect to happen: STATUS=
Completed
The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)
kubectl logs deployment/velero -n velero
kubectl logs deployment/maya-apiserver -n openebs
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
disaster simulation
check backups
output
restore
check result
output
output
The pv(
pvc-b0e72129-dcc5-48ac-b891-b052bd38ad74
) is dynamically created by the volume provider, and its name is different from what velero requires (pvc-a117021e-6232-4e85-8e4f-133114466a24
), causing the recovery to fail? But it looks like all the relationships are correct, why does the pod still fail to start due to mounted volumes?autoSetTargetIP: "true"
Environment:
velero version
): velero 1.11.1velero client config get features
):kubectl version
):/etc/os-release
):