openkruise / kruise

Automated management of large-scale applications on Kubernetes (incubating project under CNCF)
https://openkruise.io
Other
4.65k stars 768 forks source link

[BUG] Rollback OAM Application in the Rollout scenario does not work as expected #1792

Open lujiajing1126 opened 2 weeks ago

lujiajing1126 commented 2 weeks ago

What happened:

We are using Kruise with Kruise Rollouts to do canary release. But since the workload resources, e.g. Deployment, are controlled by Kruise (operator), it is not possible to rollback for canary release.

What you expected to happen:

Rollback should work.

How to reproduce it (as minimally and precisely as possible):

  1. Create an core.oam.dev/v1beta1.Application with Kruise, for example, a Deployment (containing a container and an init container) will be generated,
  2. Create a Rollout and declare the given Deployment generated above to be controlled by Rollout operator,
  3. Make a change to the "Component Definition", for example, bump version of the init container.
  4. Make a change to the core.oam.dev/v1beta1.Application, for example, the image of the container, in order to trigger a canary release. During this step, a new canary deployment is created (together with the new init container).
  5. Rollback the container image, but still with the new init container.

So the rollback failed since the init container has been updated.

Anything else we need to know?:

Environment:

furykerry commented 2 weeks ago

plz clarify how step 5 is performed, and what result you expected.

AiRanthem commented 1 week ago

@lujiajing1126 I've sent you an email to confirm the scenario of this case. Please provide some feedback after confirming.

lujiajing1126 commented 1 week ago

@lujiajing1126 I've sent you an email to confirm the scenario of this case. Please provide some feedback after confirming.

I've confirmed the case

AiRanthem commented 1 week ago

@lujiajing1126 The issue seems to be caused by improper usage: Components should be completely decoupled from business logic. It's recommended to modify the Component to parameterize the init container’s image like the business container’s image, managing them uniformly in the Application. Here is a demo:

# component.yaml
apiVersion: core.oam.dev/v1beta1
kind: ComponentDefinition
metadata:
  name: rollout-test
spec:
  workload:
    definition:
      apiVersion: apps/v1
      kind: Deployment
  schematic:
    cue:
      template: |
        parameter: {
          mainImage: string
          initImage: string
        }
        output: {
          apiVersion: "apps/v1"
          kind:       "Deployment"
          metadata: {
            name: context.name
          }
          spec: {
            selector: matchLabels: {
              app: context.name
            }
            template: {
              metadata: labels: {
                app: context.name
              }
              spec: {
                initContainers: [{
                  name:  "init-container"
                  image: parameter.initImage
                  command: ["sh", "-c", "echo Init Container Running"]
                }]
                containers: [{
                  name:  "main-container"
                  image: parameter.mainImage
                }]
              }
            }
          }
        }
# app.yaml
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: vela-app
spec:
  components:
    - name: app
      type: rollout-test
      properties:
        initImage: busybox:1
        mainImage: hello-world:v1
      traits:
        - type: scaler
          properties:
            replicas: 4
  policies:
    - name: target-default
      type: topology
      properties:
        clusters: ["local"]
        namespace: "default"
  workflow:
    steps:
      - name: deploy2default
        type: deploy
        properties:
          policies: ["target-default"]
lujiajing1126 commented 1 week ago

@lujiajing1126 The issue seems to be caused by improper usage: Components should be completely decoupled from business logic. It's recommended to modify the Component to parameterize the init container’s image like the business container’s image, managing them uniformly in the Application. Here is a demo:

It makes no sense. The OAM template always has a chance to be updated.

Is it possible to detect if the workload is controlled by Rollout operator? and then we may be able to keep component revision during canary stage.