vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.41k stars 1.37k forks source link

Only one change image rule can match when restore change image #6344

Open wawa0210 opened 1 year ago

wawa0210 commented 1 year ago

What steps did you take and what happened:

When I try to restore with the velero change image rule, multiple rules only hit one

What did you expect to happen:

It is hoped that the image can be hit by all rules and change completed

The following information will help us better understand what's going on:

this is origin pod yaml

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
  - name: container1
    image: docker.m.daocloud.io/abc:test
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

Below is my change rule configmap

apiVersion: v1
kind: ConfigMap
metadata:
  # any name can be used; Velero uses the labels (below)
  # to identify it rather than the name
  name: change-image-name-config
  # must be in the velero namespace
  namespace: velero
  # the below labels should be used verbatim in your
  # ConfigMap.
  labels:
    # this value-less label identifies the ConfigMap as
    # config for a plugin (i.e. the built-in restore item action plugin)
    velero.io/plugin-config: ""
    # this label identifies the name and kind of plugin
    # that this ConfigMap is for.
    velero.io/change-image-name: RestoreItemAction
data:
  "case1": "docker.m.daocloud.io,docker.m1.daocloud.io"
  "case5": ":test,:v1.2.0"

I hope the effect after the restore is

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
  - name: container1
    image: docker.m1.daocloud.io/abc:v1.2.0
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

but actual is

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
  - name: container1
    image: docker.m1.daocloud.io/abc:test
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

After checking the code, it is found that once a certain rule is matched, it will return directly, ignoring subsequent matches

https://github.com/vmware-tanzu/velero/blob/8427a9fdb35375353de438363305ba331659cd13/pkg/restore/change_image_name_action.go#L196-L206

Anything else you would like to add:

Environment:

velero v1.11.0

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

blackpiglet commented 1 year ago

I'm not sure about this proposal. Could you give some examples why merging the two rules into one not work for you?

wenterjoy commented 1 year ago

I think it is really a bug that the rule is only matched with the first one that is found.

blackpiglet commented 1 year ago

My understanding of the ConfigMap rules is

  1. It can have multiple rules.
  2. Each rule is supposed to replace a specific container, and there shouldn't be more than one rule applied to the same image.

I'm not sure about allowing multiple rules to apply to the same image. Of course, that can make the rules more flexible, but the rules are less readable to users. Users cannot tell the result until they read through all the existing rules.

@sseago @qiuming-best What's your opinion?

wawa0210 commented 1 year ago

I think it is really a bug that the rule is only matched with the first one that is found.

I think this is a bug, the rules here should be more flexible

Each rule is supposed to replace a specific container

This is not realistic in real scenarios. For example, the user's image may come from multiple different registries, different projects, and different tags. It is very difficult to complete accurate matching, and there will also be conflicts in the order of configuration rules. If the user wants to add a new matching rule, it can only be placed first in order to ensure that it takes effect, but the matching priority of the previous rule is lowered and becomes unpredictable

such as

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
  - name: container1
    image: dev-harbor.xx.io/app/abc:v1.2.0
    command: ['']
  - name: container2
    image: dev-harbor.xx.io/app/abc-init:v1.2.0
    command: ['']
  - name: container3
    image: dev1-harbor.xx.io/common/envoy:v1.2.0
    command: ['']
  - name: container4
    image: dev1-harbor.xx.io/common/proxy:v1.2.0
    command: ['']

// some real scenarios 
// buiness registry change
// base common tools change registry and project
// need to bump image version because cve security policy
dev-harbor.xx.io --> prod-harbor.xx.io
dev1-harbor.xx.io/common/ --> prod-harbor.xx.io/commonv2/
envoy:v1.2.0 --> envoy:v1.3.0

// In the past, I needed to write the rules like this, exactly matching(As cases increase, 
// rules need to be written precisely,However, in many scenarios, the registry address and project 
// replacement rules are consistent, so subsequent maintenance costs increase)

// dev-harbor.xx.io/app, prod-harbor.xx.io/app
// dev-harbor.xx.io/app, prod-harbor.xx.io/app
// dev-harbor.xx.io/app, prod-harbor.xx.io/app
// dev1-harbor.xx.io/common/envoy:v1.2.0, prod-harbor.xx.io/commonv2/envoy:v1.3.0
// dev1-harbor.xx.io/common/proxy:v1.2.0, prod-harbor.xx.io.xx.io/commonv2/envoy:v1.3.0

// new rules(High readability and easy maintenance)
// dev-harbor.xx.io,prod-harbor.xx.io
// dev1-harbor.xx.io/common,prod-harbor.xx.io/commonv2
// envoy:v1.2.0,envoy:v1.3.0
blackpiglet commented 1 year ago

There is a proposed design to add a generic substitution in the Restore workflow. It can also modify the images but doesn't have the ability that this PR requests. https://github.com/vmware-tanzu/velero/pull/5880/

Could this requested function be integrated into the design workflow?

reasonerjt commented 10 months ago

I think we will support whatever the resource modifier can handle. If it can't, we'll first see if we can enhance the resource modifier.

@anshulahuja98 Is it possible for resource modifier to achieve the configmap for change-image-name-config? https://velero.io/docs/v1.12/restore-reference/#changing-poddeploymentstatefulsetdaemonsetreplicasetreplicationcontrollerjobcronjob-image-repositories

In addition, any idea on supporting the use case proposed by op?

anshulahuja98 commented 10 months ago

Logically just the act of changing images is fully supported by resourcemodifer design.

practically the only issue might be that the user will have to define multiple substitutions whereas in case of "image-change" they could provide prefix.

To enhance the end user experience, we can add regex support to check on the prefix of the image instead of the whole image and then substitute it. This was already called as future scope in design.

It won't be the exact same functionality as "image-change" / or the request in the issue, since they are focusing on specific rule-based sub, but with explicit substitution it is still be supported.

We can think through other approaches apart from regex match on value to see if we can add more value here using resourcemodifer, or if regex might be enough here.

qiuming-best commented 10 months ago
version: v1
resourceModifierRules:
- conditions:
    groupKind: deployments.apps
    resourceNameRegex: "resource-modifiers-.*"
  patches:
  - operation: replace
    path: "/spec/template/spec/containers/0"
    value: "{\"name\": \"nginx\", \"image\": \"nginx:1.14.2\"}"

@anshulahuja98 Take this YAML for example, it will replace the deployments images with nginx:1.14.2, as it will replace the images directly.

In this issue, @wawa0210 wants to replace container images that contain the keywords docker.m.daocloud.io and : test with docker.m1.daocloud.io, :v1.2.0.

Currently, we can replace images with one specific value without further matching and replacing target JSON values.

below is the shell command that replaces the image:

kubectl get pod my-pod -o json | \
  jq '.spec.containers[0].image |= (sub("docker.m.daocloud.io"; "docker.m1.daocloud.io") | \
  sub("test"; "v1.2.0"))' | \
  kubectl patch pod my-pod -p "$(cat -)"

For this image matching and replacement requirement, what about enhancing the handling for the target replacement JSON values?

we could support jq command expression.

version: v1
resourceModifierRules:
- conditions:
    groupKind: deployments.apps
    resourceNameRegex: "resource-modifiers-.*"
  patches:
  - operation: replace
    path: "/spec/template/spec/containers/0"
    valueReplaceRegex: 'sub("docker.m.daocloud.io"; "docker.m1.daocloud.io") | sub("test"; "v1.2.0")'

Also jq command supports regular expression below may be one example of replacing the keyword daocloud with abccloud

version: v1
resourceModifierRules:
- conditions:
    groupKind: deployments.apps
    resourceNameRegex: "resource-modifiers-.*"
  patches:
  - operation: replace
    path: "/spec/template/spec/containers/0"
    valueReplaceRegex: 'sub("([^/]+)/daocloud"; "\\1/abccloud")'
anshulahuja98 commented 10 months ago

Thanks for this proposal @qiuming-best, I think what you shared above makes sense and I agree to it in principle.

Initially I was also evaluating using libraries which could give good regex support for value checks, and hence started with jsonpath. But couldn't get the required functionality through it. Jq was not something I was able to evaluate as an alternative then.

1 way is that we deprecate current jsonpatch based impl in favour of a pure jq based approach

Or we extend the current patch contract to have another field "jqQuery" and let jsonpatch and jqQuery both stay supported scenarios.

Although before we jump into velero side changes to support / extend resource modifiers.

We should perhaps look at the feature completeness / gaps in go impl of jq libraries. And also be wary of the maintenance of such libraries. I see one library upfront: https://github.com/itchyny/gojq

qiuming-best commented 8 months ago

@anshulahuja98 Currently, we are considering whether some requirements should be included in 1.13.

Do you think it's necessary to include the enhancement that supports jq in the 1.13 development? So far, we haven't gathered many user requests for this matching and replacing target JSON values.

anshulahuja98 commented 8 months ago

Hi @qiuming-best We can consider this based on requirement, no specific push from my end. Completely based on if there are significant asks around this. Whoever picks up / prioritize can start design on top of current resource modifier under conditions and I can help review it.

anshulahuja98 commented 8 months ago

I won't be able to commit to implementing this.

qiuming-best commented 8 months ago

@ wawa0210 The flexible condition to match the specific fields and do the replacement is a good requirement.

However, many users have not raised this requirement, and we will consider adding it later.

sseago commented 2 months ago

I wonder whether there are edge cases here that we're not handling. Is the assumption, for example, that rules are applied sequentially to the results of the last rule, or do we find all matches to the original value?

Starting with the case defined in the original issue, what about the following more complicated scenarios:

scenario 1:

    image: docker.m.daocloud.io/abc:test
...
data:
  "case1": "docker.m.daocloud.io,docker.m1.daocloud.io"
  "case2": "docker.m.daocloud.io,docker.m2.daocloud.io"

In this case there are two rules that match the image from backup. I imagine here we'd want just the first match to be applied.

scenario 2:

    image: docker.m.daocloud.io/abc:test
...
data:
  "case1": "docker.m.daocloud.io,docker.m1.daocloud.io"
  "case2": "docker.m1.daocloud.io,docker.m2.daocloud.io"

In this case, case2 matches the image after case1 rule is applied, but if we're determining matches before making any changes we won't find a match in case2.

The proposed PR applies the cases on top of prior subsitutions, which would mean that scenario1 would just make the first replace, and scenario 2 would make both replaces, resulting in a final value of "docker.m2.daocloud.io" from the initial "docker.m.daocloud.io".

One use case that the proposed change would break, though is this:

Say you have pods with two images: "docker.io/image1' and "docker.io/image2" and you want to swap these values on restore. Current code would accomplish that with the following rule:

data:
  "case1": "docker.io/image1,docker.io/image2"
  "case2": "docker.io/image2,docker.io/image1"

Current code would result in swapping these values on restore -- i.e. image1 pods get image2, and image2 pod get image1. However, with the proposed change, all pods will have image1 -- i.e. image2 pods get image1, and image1 pods are unchanged, since the processing will change image1 to image2 and back to image1 again.

This may not be a valid or useful use case at all -- but I have a feeling there are edge cases here that we need to be careful about before we start making changes that changes behavior of existing code.