vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.69k stars 1.4k forks source link

Add Generic Find/Replace Plugin for Restores #474

Open jordanwilson230 opened 6 years ago

jordanwilson230 commented 6 years ago

First of all, this is an incredible tool.

Issue: When restoring a backup onto a new cluster using e.g.,--namespace-mappings staging:staging-test or --namespace-mappings staging:staging, any ingress or LBs that are deployed will overwrite dns entries that are currently in place for the cloned cluster. This isn't necessarily undesirable (i.e., for disaster recovery), but it prevents ark from being used easily for cloning into new test environments. Is it possible to search and replace these resource specific fields and replace them with a value specified with e.g., --hosted-zone=test.example.com or --hosted-zone=staging-test.example.com?

I tried downloading the backup locally, unpacking it, and running sed replacements, but there were issues after restoring from the raw json via kubectl. I can't remember what the issues were, but I suspect it had to do with the ordering in which resources were created (I was pretty lazy in issuing a kubectl apply -f on all directories).

This may be something that cannot be easily resolved, but I thought it still worth asking. If not possible, the best method might be to exclude those resources and manually create them after a restore...I will test that as well.

ncdc commented 6 years ago

@jordanwilson230 could you please give us a bit more information about what specific fields need to have their values changed? An example with yaml/json would be quite helpful. Thank you!

jordanwilson230 commented 6 years ago

@ncdc No problem, and thanks for looking at it. Here is an example for an ingress we're using:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: management-ingress
  annotations:
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
    - secretName: kubernetes.bitbrew.com
  rules:
  **- host: api-service-staging.kubernetes.bitbrew.com**
    http:
      paths:
      - path: /v1/*
        backend:
          serviceName: management-svc
          servicePort: 80
  **- host: api-management-staging.hub.bitbrew.com**
    http:
      paths:
      - path: /v1/*
        backend:
          serviceName: management-svc
          servicePort: 80

Here is an example of a LB we're using:

---
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-lb
  annotations:
    **external-dns.alpha.kubernetes.io/hostname: staging-rabbitmq.kubernetes.bitbrew.com.**
spec:
  type: LoadBalancer
  ports:
  - name: client-port
    protocol: TCP
    port: 4443
    targetPort: 5672
  - name: srmqp-mgmt-port
    protocol: TCP
    port: 15671
    targetPort: 15671
  - name: sclient-port
    protocol: TCP
    port: 5671
    targetPort: 5671
  selector:
    app: rabbitmq

Note that for the last example, we're using an annotation for external-dns to pick up which of course is different than the ingress. It'd be great to have, but I don't expect support for anything outside of the core K8s resources :)

Ideally, something such as --hosted-zone=test.kubernetes.bitbrew.com would replace (in the above example) all dns hostname fields i.e., staging-api.kubernetes.bitbrew.com with staging-api.test.kubernetes.bitbrew.com. This way, the cluster we restore into will not overwrite the DNS records for the cluster we've cloned from (which is still running). Of course, this assumes that the hosted zone, test.kubernetes.bitbrew.com has been created ahead of time by us, which is fine.

ncdc commented 6 years ago

@jordanwilson230 ok this is much clearer now, thanks. Would something like a generic find/replace plugin work for you, where you could configure it somewhat like this?

changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...
jordanwilson230 commented 6 years ago

@ncdc Yes, that would work, but how that example you gave applied? I've taken a stab at building from source (working with collections in the service-restore.go file to do a replace), and I've looked at the plugin examples, but I'm not sure where that falls in. Also, I've never looked at Golang until now, so I'm not so proficient. Is the snippet you provided part of a k8 merge definition or something? Thanks again.

jordanwilson230 commented 6 years ago

Right now I'm using a pretty nasty looking sed command as part of the script I'm using. ...After downloading a backup:

    tar xzf ${BACKUP}-data.tar.gz
    rm ${BACKUP}-data.tar.gz
# Search for DNS/hostnames in the backup and prefix with user specified value (e.g., test to create test.kubernetes.bitbrew.com)
    read -p "Enter a name to prefix to kubernetes.bitbrew.com and hub.bitbrew.com (i.e., test): " hosted_zone
    files=($(grep -lrie '.bitbrew.com' ./resources ))
    for file in ${files[@]} ; do cat $file | jq '.' | sed -e 's|last-applied-configuration":\(.*\)\\n"\(.*\)|last-applied-configuration":"{\\"\\"}"\2|g; s|\(.*\)\.\(.*\).bitbrew.com\(.*\)|\1.'${hosted_zone}'.\2.bitbrew.com\3|g' | jq -r '. | @json' > ${file}.copy ; done
    for file in ${files[@]}; do mv ${file}.copy ${file} ; done
    tar czf ${BACKUP}.tar.gz resources
    gsutil cp ${BACKUP}.tar.gz gs://${BUCKET}/${BUCKET}/

It seems to work, but I certainly prefer efficiency and simplicity!

ncdc commented 6 years ago

This is a hypothetical configuration for a restore item action plugin that doesn't currently exist 😄.

neith00 commented 6 years ago

I have the same use case. 2 clusters: 1 for Dev, 1 for Prod. 2 different domains for ingresses. Would like to clone env from Prod cluster to Dev cluster for debugging purpose for example. The ideal case would be ark modifying ingresses domain on the fly when restoring. Could be applicable to storageClass the day it support it. It's not our primary way to deploy resources on k8s of course. We usually do it through our CI/CD pipeline. I was wondering if there are any way to use ksonnet engine to do this in ark.

ncdc commented 6 years ago

We'll work with @bryanl & team to figure out the best way to implement this.

rosskukulinski commented 6 years ago

This is a great customer use-case for resources that are typically tied to a specific environment. External-dns as described here as one example, I would think another might be kube-lego/cert-manager are other workloads that are highly dependent on fields like these.

+1 for unlocking new use cases!

Suggest renaming title of issue to Add Generic Find/Replace plugin for restores

Evesy commented 6 years ago

A generic find/replace plugin would be great for us too. Thinking about certain applications deployed that query GCP resources, these often reference specific zones/regions.

In the event of a region outage, we may choose to deploy a cluster (and other GCP resources) into another region. Any application that has this region set via config will now be incorrect; a find/replace would allow us to use common arguments/environment variables for these settings, and replace them to the correct values on a restore

jordanwilson230 commented 6 years ago

Any update on this? We're at the point of deciding whether to implement an internal backup/restore process for spinning up test environments, but would greatly prefer using ark and the features that come with it! Thanks!

ncdc commented 6 years ago

@jordanwilson230 we have not implemented this yet. We'd greatly appreciate your feedback as to how you think users should specify the transformations. This is something where the UI/UX really needs to be clear and easy.

jordanwilson230 commented 6 years ago

Perhaps this functionality would a good fit at the plugin level? If it would be possible to provide examples specific for transforming (perhaps an example for each resource type)?

This plugin, for example: https://github.com/heptio/ark-plugin-example/blob/de9801def1466e73a72a62dd5c1a71dc479117b2/ark-restoreitemaction-log-and-annotate/myrestoreplugin.go#L45 shows how to add an annotation via k8s.io/apimachinery/pkg/api/meta and k8s.io/apimachinery/pkg/runtime. For those of us that don't know GO or are inexperienced with k8s api, it would be awesome if you guys were able to add other examples:

Plugins might offer users more flexibility and in-house customization without adding complexity to ark core. In that case, perhaps all that is needed are some code examples for search/replacing in https://github.com/heptio/ark-plugin-example

@ncdc, @rosskukulinski, what are your thoughts on using plugins for this?

ncdc commented 6 years ago

I do think this can and should be a plugin. I'd like it to be generic enough that it meets the needs of as close to 100% of users as possible.

Do you anticipate that the same set of transformations would apply to every restore in a cluster? My guess is "no" but I'd love to hear the community's thoughts on this.

If the answer is "no", then we need a way to tell a restore which set of transformations to use. I think the easiest way to do this is probably with label selectors. We could store the transformation configurations as configmaps, and one of the pieces of data in each configuration would be a label selector to match against restores. The plugin would load all the transformation configmaps into memory, and then do selector matching against the labels on the restore.

jordanwilson230 commented 6 years ago

@ncdc I think pinning the transformations to labels is a great idea. I too would like to hear from others' thoughts @Evesy @neith00

To answer your question, you're right; for it to be applicable to the larger community, I think users will eventually find a need to apply different sets of transformation (as you mentioned, for example, via config).

Im starting to read more on the basics of Golang in an effort to help, but if anyone at Heptio eventually has the time for this feature, that'd be awesome! Love your guys' work.

ncdc commented 6 years ago

We'd love to hear your thoughts on the structure of the configuration. In a previous comment, I suggested something like this:

changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

We'd probably want to take advantage of Kubernetes api machinery so it would probably look more like:

apiVersion: ark.heptio.com/v1
kind: RestoreTransformation
metadata:
  namespace: heptio-ark
  name: my-transformation-1
selector:
  matchLabels:
    color: green
changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

I'm primarily interested in brainstorming on the changes section:

neith00 commented 6 years ago

In my understanding you are proposing to store transformation in configmaps, but I don"t see how it could operate. You want to store configmaps with transformation in the same namespace as ark or in the destination namespace?

ncdc commented 6 years ago

In the same namespace as ark

On Wed, Oct 31, 2018 at 6:43 AM neith00 notifications@github.com wrote:

In my understanding you are proposing to store transformation in configmaps, but I don"t see how it could operate. You want to store configmaps with transformation in the same namespace as ark or in the destination namespace?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/heptio/ark/issues/474#issuecomment-434640774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAABYiTFZkwI53CJxojq2e52LFWwHBkdks5uqX7kgaJpZM4TxR2N .

neith00 commented 6 years ago

how would you treat the case where you only want to change the domain name in an ingress with your example without using find?

ncdc commented 6 years ago

If you want to match a specific ingress & its domain name and make a change, you would need to use find. Perhaps what I wrote above was confusing when I said find should be optional?

neith00 commented 6 years ago

got it you meant the field find is optionnal. now I get it

neith00 commented 5 years ago

@ncdc I think https://github.com/google/kasane could be useful for replacing complex values

jrnt30 commented 5 years ago

If you squint, this is somewhat similar to the "general" problem we have of customizing manifests for deployment in the first place. Having the ability to compose those transformations via Plug-Ins would be very powerful indeed.

@neith00 's suggestion of Kasane was very similar to where my mind was going with this problem with something like https://github.com/kubernetes-sigs/kustomize

There are several projects that attempt to support patching/overlay of the resources and provide some of the "Hard Work" around this work.

ncdc commented 5 years ago

cc @bryanl - would love to get your input & we should chat about this soon-ish

justinhauer commented 5 years ago

Was asked by @nrb to add my use case's requirements to this issue. My organization is migrating to a new flavor of kubernetes, and unfortunately all resource definitions don't align with our new offering so we cannot simply do an Ark/Velero backup and restore. Example: Tectonic has a custom ingress, that won't line up with what we have in either a plain vanilla k8s offering or a more opinionated offering. It would be great to have either a 'genericizer' of these resource types that checks the new and old cluster if the old resource definition isn't in the new cluster. Or, if the new cluster has it's own 'opinionated' option for that new resource type, if it could do its best guess to line that up in new cluster (example: openshift CRDs).

rajakshay commented 5 years ago

I have a similar use-case. We create 2 ingress for apps on our k8s cluster:

Example: If my application name is nginx-app, the generic ingress for stage environment would be nginx-app.stage.domain.net and the cluster-specific domains would be nginx-app.cluster1.stage.domain.net and nginx-app.cluster2.stage.domain.net where cluster1 and cluster 2 are the names of my k8s clusters.

While migrating workloads from cluster 1 to cluster 2, generic domain remains the same, and hence, no changes needed there. But, the cluster-specific ingress domain will still have the old hostname in the ingress, and the app will not be reachable on the new cluster using the cluster-specific hostname.

We'd love to hear your thoughts on the structure of the configuration. In a previous comment, I suggested something like this:

changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

We'd probably want to take advantage of Kubernetes api machinery so it would probably look more like:

apiVersion: ark.heptio.com/v1
kind: RestoreTransformation
metadata:
  namespace: heptio-ark
  name: my-transformation-1
selector:
  matchLabels:
    color: green
changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

I'm primarily interested in brainstorming on the changes section:

  • How do we specify the name of the field to transform

    • What if there's an array in the path?
    • What if there's a map in the path?
    • Where do we allow wildcard matching?
  • Should find be optional? (my vote is yes)
  • etc

What you have explained here would probably work for all use-cases described in this issue.

zikes commented 5 years ago

Use-case: stripping specific annotations from backup/restore operations.

I had an incident recently where I created a backup of a cluster in DigitalOcean, then tested restoring that backup into a separate cluster. Unfortunately DigitalOcean stores the UUID of their load balancer as an annotation in the LoadBalancer service, which means that when the snapshot was restored instead of provisioning a new DO Load Balancer it fought the initial cluster for control of the existing one.

Being able to remove this annotation would make the backup/restore viable for DigitalOcean clusters.

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/016600fb188a6c2b9082f45ea67514e8ef9509b9/docs/getting-started.md#load-balancer-id-annotations

debianmaster commented 5 years ago

+1 for this feature

Sushma10037017 commented 4 years ago

+1 for the feature!. It would be very helpful in our usecase as well.

skriss commented 4 years ago

See #2090 for a related request using kustomize.

panpan0000 commented 4 years ago

from https://github.com/vmware-tanzu/velero/issues/2892

My application scenario as below (the replica "changing" ):

In a DR/migration scenario , the backup cluster resources are smaller than the main cluster, in most cases. So, we have to change the deployment's replica, likely from 10 to 2. aka, in original cluster, the replica = 10, but in new cluster , the replica should be 2, because the new cluster is smaller. it's very practical when jumping into DR territory. so far, only the storage-class and pvc-node-selector can be "override" when migrating.

hervedevos commented 2 years ago

Is there any update on this topic ? Has anyone found a better way than the extract and modify backup way ? It would really be nice to have this feature available out of the box

R-Studio commented 2 years ago

Is there any news about this feature and +1 for this feature!

redzioch commented 2 years ago

+1

panpan0000 commented 1 year ago

can not believe this useful feature request issue still open for so many years..

panpan0000 commented 1 year ago

some use cases again:

  1. migrate from older k8s cluster to new one
  2. migrate across different clusters with different CNI(some CNI leverage annotation to specific IP pool or some other attributes) 3.master-standby(backup) disaster recovery: the master cluster has more sufficient resources, so an app can have maybe 5 replicas in master cluster ,but when restore to backup cluster which usually has fewer resources, the replicas should become fewer as well like 1~2.
  3. more Those kind of features are called "Transform" when restoring in Kasten K10, which are very useful.
qmaraval-csgroup commented 1 year ago

Hello, I have an issue regarding nodeSelector in link with this issue. One statefullset has a nodeSelector and so when i'm restoring on another cluster the nodeSelector prevents the scheduling of the restore pod (which is normal because it can be scheduled), so when i remove the nodeselector from the tar.gz resource, the issue remains, The statefullset is successfully deployed but the init container restore is not even added to the pod resources ... Does somebody has a trick for this ? I also tried to add the init container directly on the container but this one run indefinitely ...

EDIT: I have found a way to make it works, in fact i just delete nodeselector value and not all the tag itself. So i you face this issue: Download yourbackup.tar.gz In Pod directory remove from json resources the nodeselector line. Reupload the modified yourbackup.tar.gz in the S3. Enjoy.

kaovilai commented 1 year ago

fyi this issue can be partially solved with https://github.com/vmware-tanzu/velero/issues/5809 which has since been addressed.

find/replace sounds like something to define at restore time as a configmap similar to #5809's implementation.

red8888 commented 6 months ago

I would much prefer this to resource modifiers which require me to specify a patch for all possible resources.

This is all just text right? Can't we have a plugin that merely does a big find replace on all text files for a restore either mutating the restore or at restore time? Might take a while be shouldn't be too hard to just recurse through the bucket pre-fix and edit all text files right?

I have used community charts that hard code the namespace (through Helm refs) in configs. I just need to do a big find replace of that namespace value.

Obviously its a dangerous operation, but I would at least like the option.

kaovilai commented 6 months ago

There's already a ton of options in velero for non dangerous operations. We don't want to risk anyone's production ideally. Most people don't read every docs caveats until it's too late.

kaovilai commented 6 months ago

This is something someone could go out and create a velero plugin for. I believe if people can misunderstood the safety of the feature, and it can be a standalone plugin, that it be implemented outside main velero repo.

kaovilai commented 6 months ago

See restore plugin example. https://github.com/vmware-tanzu/velero-plugin-example/blob/main/internal/plugin/restorepluginv2.go

You can create a plugin that perform find/replace on items.

WRKT commented 2 months ago

@kaovilai is it possible to get help on it? I tried to implement this feature because we really need it for our use case..

However, it edits every entries of text like a sed behavior but podvolumerestores are not triggered and so numerous statefull app are stuck on init phase.

If someone would like to enhance cause I'm getting lost on it..

Also +1 on this feature

Link of the custom plugin: https://github.com/WRKT/velero-custom-plugins

kaovilai commented 2 months ago

Not soon, there has been no business push from my team for this. There are other things in the pipe atm being worked on.