vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.56k stars 1.38k forks source link

PVC's are restored as lost on AWS #34

Closed ivelichkovich closed 7 years ago

ivelichkovich commented 7 years ago

Hello,

I tried an example of cluster migration with the stable/prometheus helm chart with persistent volumes enabled.

I ran: ark backup create cluster-backup --selector backup=ark --snapshot-volumes

The backup worked fine and was created but when I run: ark restore create cluster-backup --restore-volumes

It restores everything however the PVC's come back as lost.

kubectl describe on the PVC says: "Warning ClaimLost Bound claim has lost its PersistentVolume. Data on the volume is lost!"

ncdc commented 7 years ago

Hi @ivelichkovich, could you please share a copy of the logs from the ark pod? kubectl -n heptio-ark logs deployment/ark? Preferably include the times when you did the backup and when you tried the restore.

ivelichkovich commented 7 years ago

I0810 18:59:56.461013 1 server.go:151] Ensuring heptio-ark namespace exists for backups I0810 18:59:56.481447 1 server.go:163] Namespace already exists I0810 18:59:56.481465 1 server.go:168] Retrieving Ark configuration I0810 18:59:56.485662 1 server.go:184] Successfully retrieved Ark configuration I0810 18:59:56.485678 1 server.go:217] Using default resource priorities: [namespaces persistentvolumes persistentvolumeclaims secrets configmaps] I0810 18:59:56.485749 1 server.go:244] Configuring cloud provider for backup service I0810 18:59:56.485878 1 server.go:254] Configuring cloud provider for snapshot service I0810 18:59:56.485987 1 server.go:361] Starting controllers I0810 18:59:56.485998 1 server.go:367] Caching cloud backups every 30m0s I0810 18:59:56.486190 1 backup_sync_controller.go:56] Running backup sync controller I0810 18:59:56.486206 1 backup_sync_controller.go:62] Syncing backups from object storage I0810 18:59:56.486245 1 backup_cache.go:67] refreshing all cached backup lists from object storage I0810 18:59:56.486273 1 backup_cache.go:84] bucket "kubernetes-ark-test" is not in cache - doing a live lookup I0810 18:59:56.647116 1 backup_controller.go:132] Starting BackupController I0810 18:59:56.647139 1 backup_controller.go:135] Waiting for caches to sync I0810 18:59:56.647129 1 gc_controller.go:82] Waiting for caches to sync I0810 18:59:56.647260 1 schedule_controller.go:123] Starting ScheduleController I0810 18:59:56.647276 1 schedule_controller.go:126] Waiting for caches to sync I0810 18:59:57.041093 1 server.go:474] Server started successfully I0810 18:59:57.041529 1 reflector.go:198] Starting reflector v1.Backup (0s) from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.041538 1 reflector.go:198] Starting reflector v1.Schedule (0s) from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.041621 1 reflector.go:236] Listing and watching v1.Backup from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.041682 1 restore_controller.go:128] Starting RestoreController I0810 18:59:57.041712 1 restore_controller.go:131] Waiting for caches to sync I0810 18:59:57.041621 1 reflector.go:236] Listing and watching v1.Schedule from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.042084 1 reflector.go:198] Starting reflector v1.Restore (0s) from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.042130 1 reflector.go:236] Listing and watching v1.Restore from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.042537 1 reflector.go:198] Starting reflector v1.Config (0s) from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.042569 1 reflector.go:236] Listing and watching v1.Config from github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45 I0810 18:59:57.047299 1 shared_informer.go:116] caches populated I0810 18:59:57.047316 1 backup_controller.go:139] Caches are synced I0810 18:59:57.047306 1 shared_informer.go:116] caches populated I0810 18:59:57.047360 1 gc_controller.go:86] Caches are synced I0810 18:59:57.047384 1 backup_cache.go:84] bucket "kubernetes-ark-test" is not in cache - doing a live lookup I0810 18:59:57.141950 1 shared_informer.go:116] caches populated I0810 18:59:57.141976 1 restore_controller.go:135] Caches are synced I0810 18:59:57.147453 1 shared_informer.go:116] caches populated I0810 18:59:57.147470 1 schedule_controller.go:130] Caches are synced I0810 19:00:06.714962 1 gc_controller.go:105] garbage-collecting backups that have expired as of 2017-08-10 19:00:06.714954056 +0000 UTC I0810 19:00:06.714996 1 gc_controller.go:129] Backup heptio-ark/cluster-backup1 has not expired yet, skipping I0810 19:00:06.715004 1 gc_controller.go:129] Backup heptio-ark/test2 has not expired yet, skipping I0810 19:00:06.715011 1 gc_controller.go:129] Backup heptio-ark/test2pv has not expired yet, skipping I0810 19:00:06.715016 1 gc_controller.go:129] Backup heptio-ark/test3pv has not expired yet, skipping I0810 19:00:06.721470 1 backup_sync_controller.go:68] Found 4 backups I0810 19:00:06.721487 1 backup_sync_controller.go:71] Syncing backup heptio-ark/cluster-backup1 I0810 19:00:06.736500 1 backup_sync_controller.go:71] Syncing backup heptio-ark/test2 I0810 19:00:06.736793 1 backup_controller.go:96] Backup heptio-ark/cluster-backup1 has phase Completed - skipping I0810 19:00:06.756553 1 backup_sync_controller.go:71] Syncing backup heptio-ark/test2pv I0810 19:00:06.756705 1 backup_controller.go:96] Backup heptio-ark/test2 has phase Completed - skipping I0810 19:00:06.775329 1 backup_controller.go:96] Backup heptio-ark/test2pv has phase Completed - skipping I0810 19:00:06.775759 1 backup_sync_controller.go:71] Syncing backup heptio-ark/test3pv I0810 19:00:06.788992 1 backup_controller.go:96] Backup heptio-ark/test3pv has phase Completed - skipping I0810 19:00:41.484311 1 restore_controller.go:183] processRestore for key "heptio-ark/cluster-backup1-20170810190045" I0810 19:00:41.484339 1 restore_controller.go:190] Getting restore heptio-ark/cluster-backup1-20170810190045 I0810 19:00:41.484349 1 restore_controller.go:211] Cloning restore heptio-ark/cluster-backup1-20170810190045 I0810 19:00:41.498689 1 restore_controller.go:242] running restore for heptio-ark/cluster-backup1-20170810190045 I0810 19:00:41.677155 1 restore_controller.go:325] copied 10970 bytes I0810 19:00:41.931092 1 restore.go:538] end of tar I0810 19:00:41.931258 1 restore.go:286] Restoring namespace default I0810 19:00:41.944715 1 restore.go:349] Restoring resource persistentvolumeclaims into namespace default I0810 19:00:41.944899 1 restore.go:382] Getting client for /v1, Kind=PersistentVolumeClaim I0810 19:00:41.945041 1 restore.go:401] Using custom restorer for persistentvolumeclaims I0810 19:00:41.947486 1 restore.go:444] Restoring item metrics-grafana I0810 19:01:11.976569 1 restore.go:349] Restoring resource secrets into namespace default I0810 19:01:11.976799 1 restore.go:382] Getting client for /v1, Kind=Secret I0810 19:01:11.976828 1 restore.go:398] Using default restorer for secrets I0810 19:01:11.976885 1 restore.go:444] Restoring item metrics-grafana I0810 19:01:11.998642 1 restore.go:349] Restoring resource configmaps into namespace default I0810 19:01:11.998919 1 restore.go:382] Getting client for /v1, Kind=ConfigMap I0810 19:01:11.998946 1 restore.go:398] Using default restorer for configmaps I0810 19:01:11.999004 1 restore.go:444] Restoring item metrics-grafana-config I0810 19:01:12.012134 1 restore.go:444] Restoring item metrics-grafana-dashs I0810 19:01:12.031339 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:01:12.047921 1 restore.go:349] Restoring resource daemonsets.extensions into namespace default I0810 19:01:12.048186 1 restore.go:382] Getting client for extensions/v1beta1, Kind=DaemonSet I0810 19:01:12.048288 1 restore.go:398] Using default restorer for daemonsets.extensions I0810 19:01:12.048501 1 restore.go:444] Restoring item fluentd I0810 19:01:12.064168 1 restore.go:349] Restoring resource deployments.apps into namespace default I0810 19:01:12.067348 1 restore.go:382] Getting client for apps/v1beta1, Kind=Deployment I0810 19:01:12.070582 1 restore.go:398] Using default restorer for deployments.apps I0810 19:01:12.070948 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:01:12.087972 1 restore.go:444] Restoring item cluster-kube-lego-kube-lego I0810 19:01:12.115128 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:01:12.145513 1 restore.go:444] Restoring item cluster-scaling-aws-cluster-autoscaler I0810 19:01:12.176653 1 restore.go:444] Restoring item metrics-grafana I0810 19:01:12.222450 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:01:12.274063 1 restore.go:444] Restoring item nginx-default-backend I0810 19:01:12.315571 1 restore.go:349] Restoring resource endpoints into namespace default I0810 19:01:12.315808 1 restore.go:382] Getting client for /v1, Kind=Endpoints I0810 19:01:12.315838 1 restore.go:398] Using default restorer for endpoints I0810 19:01:12.315913 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:01:12.365724 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:01:12.434970 1 restore.go:444] Restoring item cluster-scaling-aws-cluster-autoscaler I0810 19:01:12.482565 1 restore.go:444] Restoring item ingress-nginx I0810 19:01:12.515055 1 restore.go:444] Restoring item metrics-grafana I0810 19:01:12.537496 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:01:12.574617 1 restore.go:444] Restoring item nginx-default-backend I0810 19:01:12.593534 1 restore.go:349] Restoring resource ingresses.extensions into namespace default I0810 19:01:12.594013 1 restore.go:382] Getting client for extensions/v1beta1, Kind=Ingress I0810 19:01:12.594041 1 restore.go:398] Using default restorer for ingresses.extensions I0810 19:01:12.594175 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:01:12.610874 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:01:12.630593 1 restore.go:444] Restoring item metrics-grafana I0810 19:01:12.649125 1 restore.go:349] Restoring resource services into namespace default I0810 19:01:12.649815 1 restore.go:382] Getting client for /v1, Kind=Service I0810 19:01:12.649842 1 restore.go:401] Using custom restorer for services I0810 19:01:12.649919 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:01:12.699838 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:01:12.750997 1 restore.go:444] Restoring item cluster-scaling-aws-cluster-autoscaler I0810 19:01:12.804377 1 restore.go:444] Restoring item ingress-nginx I0810 19:01:12.945227 1 request.go:638] Throttling request took 140.654527ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:01:13.019272 1 restore.go:444] Restoring item metrics-grafana I0810 19:01:13.148523 1 request.go:638] Throttling request took 129.050128ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:01:13.194829 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:01:13.348482 1 request.go:638] Throttling request took 151.019073ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:01:13.414191 1 restore.go:444] Restoring item nginx-default-backend I0810 19:01:13.547668 1 request.go:638] Throttling request took 133.329449ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:01:13.607022 1 restore.go:286] Restoring namespace kube-system I0810 19:01:13.652766 1 restore.go:349] Restoring resource persistentvolumeclaims into namespace kube-system I0810 19:01:13.652989 1 restore.go:382] Getting client for /v1, Kind=PersistentVolumeClaim I0810 19:01:13.662863 1 restore.go:401] Using custom restorer for persistentvolumeclaims I0810 19:01:13.674903 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:01:13.745316 1 request.go:638] Throttling request took 70.228989ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/kube-system/persistentvolumeclaims I0810 19:01:13.765230 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:01:13.945225 1 request.go:638] Throttling request took 179.773917ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/kube-system/persistentvolumeclaims I0810 19:01:43.973291 1 restore.go:349] Restoring resource configmaps into namespace kube-system I0810 19:01:43.973597 1 restore.go:382] Getting client for /v1, Kind=ConfigMap I0810 19:01:43.973628 1 restore.go:398] Using default restorer for configmaps I0810 19:01:43.973691 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:01:43.988806 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:01:44.004516 1 restore.go:349] Restoring resource deployments.apps into namespace kube-system I0810 19:01:44.005416 1 restore.go:382] Getting client for apps/v1beta1, Kind=Deployment I0810 19:01:44.005476 1 restore.go:398] Using default restorer for deployments.apps I0810 19:01:44.005583 1 restore.go:444] Restoring item cluster-heapster-heapster I0810 19:01:44.022716 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:01:44.042238 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:01:44.078137 1 restore.go:444] Restoring item external-dns I0810 19:01:44.126437 1 restore.go:349] Restoring resource endpoints into namespace kube-system I0810 19:01:44.126649 1 restore.go:382] Getting client for /v1, Kind=Endpoints I0810 19:01:44.126679 1 restore.go:398] Using default restorer for endpoints I0810 19:01:44.126785 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:01:44.177356 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:01:44.234639 1 restore.go:444] Restoring item heapster I0810 19:01:44.281363 1 restore.go:349] Restoring resource ingresses.extensions into namespace kube-system I0810 19:01:44.281711 1 restore.go:382] Getting client for extensions/v1beta1, Kind=Ingress I0810 19:01:44.281798 1 restore.go:398] Using default restorer for ingresses.extensions I0810 19:01:44.281892 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:01:44.321622 1 restore.go:349] Restoring resource services into namespace kube-system I0810 19:01:44.321901 1 restore.go:382] Getting client for /v1, Kind=Service I0810 19:01:44.321993 1 restore.go:401] Using custom restorer for services I0810 19:01:44.322103 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:01:44.388858 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:01:44.439823 1 restore.go:444] Restoring item heapster I0810 19:01:44.512936 1 restore_controller.go:246] restore heptio-ark/cluster-backup1-20170810190045 completed I0810 19:01:44.513013 1 restore_controller.go:249] updating restore heptio-ark/cluster-backup1-20170810190045 final status I0810 19:07:04.061720 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Restore total 3 items received I0810 19:07:08.057852 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Config total 0 items received I0810 19:08:16.058921 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Backup total 4 items received I0810 19:08:23.060312 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Schedule total 0 items received I0810 19:09:19.230012 1 restore_controller.go:183] processRestore for key "heptio-ark/cluster-backup1-20170810190923" I0810 19:09:19.230046 1 restore_controller.go:190] Getting restore heptio-ark/cluster-backup1-20170810190923 I0810 19:09:19.230057 1 restore_controller.go:211] Cloning restore heptio-ark/cluster-backup1-20170810190923 I0810 19:09:19.243430 1 restore_controller.go:242] running restore for heptio-ark/cluster-backup1-20170810190923 I0810 19:09:19.425334 1 restore_controller.go:325] copied 10970 bytes I0810 19:09:19.688924 1 restore.go:538] end of tar I0810 19:09:19.689104 1 restore.go:286] Restoring namespace default I0810 19:09:19.713593 1 restore.go:349] Restoring resource persistentvolumeclaims into namespace default I0810 19:09:19.713756 1 restore.go:382] Getting client for /v1, Kind=PersistentVolumeClaim I0810 19:09:19.713781 1 restore.go:401] Using custom restorer for persistentvolumeclaims I0810 19:09:19.716168 1 restore.go:444] Restoring item metrics-grafana I0810 19:09:49.746789 1 restore.go:349] Restoring resource secrets into namespace default I0810 19:09:49.747021 1 restore.go:382] Getting client for /v1, Kind=Secret I0810 19:09:49.747047 1 restore.go:398] Using default restorer for secrets I0810 19:09:49.747105 1 restore.go:444] Restoring item metrics-grafana I0810 19:09:49.762123 1 restore.go:349] Restoring resource configmaps into namespace default I0810 19:09:49.762420 1 restore.go:382] Getting client for /v1, Kind=ConfigMap I0810 19:09:49.762452 1 restore.go:398] Using default restorer for configmaps I0810 19:09:49.762499 1 restore.go:444] Restoring item metrics-grafana-config I0810 19:09:49.776669 1 restore.go:444] Restoring item metrics-grafana-dashs I0810 19:09:49.796208 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:09:49.810185 1 restore.go:349] Restoring resource daemonsets.extensions into namespace default I0810 19:09:49.810416 1 restore.go:382] Getting client for extensions/v1beta1, Kind=DaemonSet I0810 19:09:49.810447 1 restore.go:398] Using default restorer for daemonsets.extensions I0810 19:09:49.810613 1 restore.go:444] Restoring item fluentd I0810 19:09:49.824376 1 restore.go:349] Restoring resource deployments.apps into namespace default I0810 19:09:49.824625 1 restore.go:382] Getting client for apps/v1beta1, Kind=Deployment I0810 19:09:49.824640 1 restore.go:398] Using default restorer for deployments.apps I0810 19:09:49.824717 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:09:49.845165 1 restore.go:444] Restoring item cluster-kube-lego-kube-lego I0810 19:09:49.859198 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:09:49.874780 1 restore.go:444] Restoring item cluster-scaling-aws-cluster-autoscaler I0810 19:09:49.897129 1 restore.go:444] Restoring item metrics-grafana I0810 19:09:49.911148 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:09:49.925232 1 restore.go:444] Restoring item nginx-default-backend I0810 19:09:49.951045 1 restore.go:349] Restoring resource endpoints into namespace default I0810 19:09:49.951261 1 restore.go:382] Getting client for /v1, Kind=Endpoints I0810 19:09:49.951288 1 restore.go:398] Using default restorer for endpoints I0810 19:09:49.951360 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:09:49.964779 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:09:49.977863 1 restore.go:444] Restoring item cluster-scaling-aws-cluster-autoscaler I0810 19:09:49.991619 1 restore.go:444] Restoring item ingress-nginx I0810 19:09:50.011962 1 restore.go:444] Restoring item metrics-grafana I0810 19:09:50.024865 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:09:50.037706 1 restore.go:444] Restoring item nginx-default-backend I0810 19:09:50.059223 1 restore.go:349] Restoring resource ingresses.extensions into namespace default I0810 19:09:50.059415 1 restore.go:382] Getting client for extensions/v1beta1, Kind=Ingress I0810 19:09:50.059441 1 restore.go:398] Using default restorer for ingresses.extensions I0810 19:09:50.059502 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:09:50.072022 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:09:50.084290 1 restore.go:444] Restoring item metrics-grafana I0810 19:09:50.096830 1 restore.go:349] Restoring resource services into namespace default I0810 19:09:50.097041 1 restore.go:382] Getting client for /v1, Kind=Service I0810 19:09:50.097067 1 restore.go:401] Using custom restorer for services I0810 19:09:50.097178 1 restore.go:444] Restoring item cluster-dashboard-kubernetes-dashboard I0810 19:09:50.219451 1 restore.go:444] Restoring item cluster-ops-view-kube-ops-view I0810 19:09:50.345236 1 request.go:638] Throttling request took 125.590528ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:09:50.423041 1 restore.go:444] Restoring item cluster-scaling-aws-cluster-autoscaler I0810 19:09:50.545215 1 request.go:638] Throttling request took 122.058077ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:09:50.616813 1 restore.go:444] Restoring item ingress-nginx I0810 19:09:50.745223 1 request.go:638] Throttling request took 128.291199ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:09:50.868838 1 restore.go:444] Restoring item metrics-grafana I0810 19:09:50.945232 1 request.go:638] Throttling request took 76.257296ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:09:51.019070 1 restore.go:444] Restoring item monitoring-influxdb I0810 19:09:51.145132 1 request.go:638] Throttling request took 125.942021ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:09:51.223543 1 restore.go:444] Restoring item nginx-default-backend I0810 19:09:51.345218 1 request.go:638] Throttling request took 121.557413ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/default/services I0810 19:09:51.421355 1 restore.go:286] Restoring namespace kube-system I0810 19:09:51.434384 1 restore.go:349] Restoring resource persistentvolumeclaims into namespace kube-system I0810 19:09:51.434523 1 restore.go:382] Getting client for /v1, Kind=PersistentVolumeClaim I0810 19:09:51.434538 1 restore.go:401] Using custom restorer for persistentvolumeclaims I0810 19:09:51.436856 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:09:51.546481 1 request.go:638] Throttling request took 109.490598ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/kube-system/persistentvolumeclaims I0810 19:09:51.562376 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:09:51.745223 1 request.go:638] Throttling request took 182.640448ms, request: POST:https://100.64.0.1:443/api/v1/namespaces/kube-system/persistentvolumeclaims I0810 19:10:21.772623 1 restore.go:349] Restoring resource configmaps into namespace kube-system I0810 19:10:21.772954 1 restore.go:382] Getting client for /v1, Kind=ConfigMap I0810 19:10:21.772984 1 restore.go:398] Using default restorer for configmaps I0810 19:10:21.773050 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:10:21.793737 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:10:21.810782 1 restore.go:349] Restoring resource deployments.apps into namespace kube-system I0810 19:10:21.811093 1 restore.go:382] Getting client for apps/v1beta1, Kind=Deployment I0810 19:10:21.811120 1 restore.go:398] Using default restorer for deployments.apps I0810 19:10:21.811260 1 restore.go:444] Restoring item cluster-heapster-heapster I0810 19:10:21.829836 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:10:21.843816 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:10:21.857946 1 restore.go:444] Restoring item external-dns I0810 19:10:21.871090 1 restore.go:349] Restoring resource endpoints into namespace kube-system I0810 19:10:21.871321 1 restore.go:382] Getting client for /v1, Kind=Endpoints I0810 19:10:21.871349 1 restore.go:398] Using default restorer for endpoints I0810 19:10:21.871423 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:10:21.884012 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:10:21.902892 1 restore.go:444] Restoring item heapster I0810 19:10:21.915273 1 restore.go:349] Restoring resource ingresses.extensions into namespace kube-system I0810 19:10:21.915440 1 restore.go:382] Getting client for extensions/v1beta1, Kind=Ingress I0810 19:10:21.915469 1 restore.go:398] Using default restorer for ingresses.extensions I0810 19:10:21.915522 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:10:21.931078 1 restore.go:349] Restoring resource services into namespace kube-system I0810 19:10:21.931262 1 restore.go:382] Getting client for /v1, Kind=Service I0810 19:10:21.931290 1 restore.go:401] Using custom restorer for services I0810 19:10:21.931364 1 restore.go:444] Restoring item cluster-prometheus-prometheus-alertmanager I0810 19:10:21.995046 1 restore.go:444] Restoring item cluster-prometheus-prometheus-server I0810 19:10:22.067319 1 restore.go:444] Restoring item heapster I0810 19:10:22.148072 1 restore_controller.go:246] restore heptio-ark/cluster-backup1-20170810190923 completed I0810 19:10:22.148095 1 restore_controller.go:249] updating restore heptio-ark/cluster-backup1-20170810190923 final status I0810 19:13:09.056838 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Config total 0 items received I0810 19:15:49.055226 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Restore total 3 items received I0810 19:16:52.051583 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Schedule total 0 items received I0810 19:17:35.049060 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Backup total 0 items received I0810 19:19:00.050728 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Config total 0 items received I0810 19:23:31.045720 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - v1.Restore total 0 items received I0810 19:23:58.041575 1 reflector.go:405] github.com/heptio/ark/pkg/generated/informers/externalversions/factory.go:45: Watch close - *v1.Backup total 0 items received

So.... looking at my aws snapshots I see 3 snapshots (which is expected I had 3 PVCs. 1 grafana, 2 prometheus) which I assume is from the backup (but could be from a different backup test run I can confirm this in a second) but I don't see any existing volumes made from those snapshots.

ncdc commented 7 years ago

Can you please also share ark backup get <backup name> -o yaml and ark restore get <restore name> -o yaml? If there's any data in there that's private, feel free to sanitize.

ivelichkovich commented 7 years ago

Ok not sure where those snapshots came from. Just ran backup again and it did not create snapshots.

The backup looks bad, my one restore that worked had volumeBackups with a list of objects but this has null:

apiVersion: ark.heptio.com/v1
kind: Backup
metadata:
  creationTimestamp: 2017-08-10T19:43:53Z
  name: cluster-backup3
  namespace: heptio-ark
  resourceVersion: "1395"
  selfLink: /apis/ark.heptio.com/v1/namespaces/heptio-ark/backups/cluster-backup3
  uid: 3d747630-7e04-11e7-8cad-065991c28584
spec:
  excludedNamespaces: null
  excludedResources: null
  includedNamespaces:
  - '*'
  includedResources:
  - '*'
  labelSelector:
    matchLabels:
      backup: ark
  snapshotVolumes: true
  ttl: 24h0m0s
status:
  expiration: 2017-08-11T19:43:53Z
  phase: Completed
  validationErrors: null
  version: 1
  volumeBackups: null

The restore had errors related to waiting to long for volume to attach or something but I nuked that cluster so I can't check them out

ncdc commented 7 years ago

Assuming you're using our example deployment, which runs the server at log level 4, I would expect to see something like this when it's snapshotting a volume:

Executing action on persistentvolumes, ns=, name=<pv name>
Backup <backup name>: snapshotting PersistenVolume <pv name>, volume-id <volume id>, expiration <expiration>

Or if there was an error determining the volume ID, you'd see

unable to determine volume ID for backup <backup name>, PersistentVolume <pv name>

I didn't see either of those in your logs. Are you seeing any errors anywhere?

skriss commented 7 years ago

@ivelichkovich do the PV's that you're trying to backup/restore have the backup=ark label applied? Since you're specifying a label selector in the backup, they'd need to be labeled as such in order to be captured.

ncdc commented 7 years ago

Also see #15 for an idea how to work around this (if the label is in fact the issue here).

ivelichkovich commented 7 years ago

Ahh that's it. No errors or snapshot logs but the backup that did snapshot correctly had no selector.

I have the label on a PVC and it's not inherited by the PV. A nice feature would be to get the volume from the pvc and backup that volume.

A workaround that I imagine will work is to backup without a selector and then restore that first to get the volumes then I can bring all the kubernetes resources back.

Thank you for the help!

ncdc commented 7 years ago

A nice feature would be to get the volume from the pvc and backup that volume.

That's what we're aiming to do as part of #15

skriss commented 7 years ago

No problem, and please let us know if you have any further feedback and/or issues!

ivelichkovich commented 7 years ago

Ahh that's cool. I didn't realize not specifying a selector would back everything up. Great tool! You guys released right when I started looking into disaster recovery and ark has made the job much much easier :)

ivelichkovich commented 7 years ago

Ok now it created the snapshots with no selector but it's telling me

"error preparing /tmp/093010796/cluster/persistentvolumes/pvc-efa26d87-7e03-11e7-a939-12f0b6779a20.json: InvalidParameterCombination: The parameter iops is not supported for gp2 volumes.\n\tstatus code: 400, request id: fddc16e9-e12b-4b2e-9dd5-0b5cefdd3f0f"

ncdc commented 7 years ago

Ok that is definitely a bug that we can fix. @skriss since you worked on adding iops support, can you take this?

skriss commented 7 years ago

sure thing. @ivelichkovich can you send the results of ark backup get <backup name> -o yaml again?

ivelichkovich commented 7 years ago

The successful backup? Here it is:

apiVersion: ark.heptio.com/v1
kind: Backup
metadata:
  creationTimestamp: 2017-08-11T14:48:02Z
  name: volume-migrate1
  namespace: heptio-ark
  resourceVersion: "139898"
  selfLink: /apis/ark.heptio.com/v1/namespaces/heptio-ark/backups/volume-migrate1
  uid: 135cefc2-7ea4-11e7-8c22-12d0660ff8fe
spec:
  excludedNamespaces: null
  excludedResources: null
  includedNamespaces:
  - '*'
  includedResources:
  - '*'
  labelSelector: null
  snapshotVolumes: true
  ttl: 24h0m0s
status:
  expiration: 2017-08-11T20:15:21Z
  phase: Completed
  validationErrors: null
  version: 1
  volumeBackups:
    pvc-efa26d87-7e03-11e7-a939-12f0b6779a20:
      iops: 100
      snapshotID: snap-0019f473ba6ffdcfc
      type: gp2
    pvc-efa4591a-7e03-11e7-a939-12f0b6779a20:
      iops: 100
      snapshotID: snap-00b49cb855ec00434
      type: gp2
    pvc-f36ed605-7e03-11e7-a939-12f0b6779a20:
      iops: 100
      snapshotID: snap-0a0fa6555a234e212
      type: gp2 
skriss commented 7 years ago

@ivelichkovich I see the issue and am working on a fix. the possible workarounds for now would be to not use gp2 volumes, or to manually edit the backup YAML and delete the iops: 100 lines before performing a restore.

ncdc commented 7 years ago

@ivelichkovich #37 fixes the gp2/iops issue. Are you ok if we close this issue now?