pipe-cd / pipecd

The One CD for All {applications, platforms, operations}
https://pipecd.dev
Apache License 2.0
1.09k stars 153 forks source link

SCRIPT_RUN_ROLLBACK failed when executing multiple SCRIPT_RUN stages. #5163

Open ffjlabo opened 2 months ago

ffjlabo commented 2 months ago

What happened:

If you perform a rollback with multiple Script Runs specified, the execution of the SCRIPT_RUN_ROLLBACK stage will fail.

dp-sig-build__Channel__-_CyberAgent_-_5_new_items_-_Slack

What you expected to happen:

Successfully finish executing the SCRIPT_RUN_ROLLBACK stage.

How to reproduce it:

Execute the deployment with multiple SCRIPT_RUN stage, and cancel after some of them are in the executing.

apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
  name: script-run-like-jenkins
  labels:
    env: example
    team: product
  pipeline:
    stages:
      - name: SCRIPT_RUN
        with:
          run: |
            sh script.sh
          onRollback: |
            echo rollback
      - name: SCRIPT_RUN
        with:
          run: |
            sleep 10
            sh script.sh
          onRollback: |
            echo $SR_DEPLOYMENT_ID
            echo $SR_APPLICATION_ID
            echo $SR_APPLICATION_NAME
            echo $SR_TRIGGERED_AT
            echo $SR_TRIGGERED_COMMIT_HASH
            echo $SR_REPOSITORY_URL
            echo $SR_SUMMARY
            echo $SR_CONTEXT_RAW
            sh script.sh
      - name: SCRIPT_RUN
        with:
          run: |
            sleep 10
            sh script.sh

Environment:

ffjlabo commented 2 months ago

[root cause] The error occurs when piped tries to store the stage log to the completed SCRIPT_RUN_ROLLBACK stage.

piped identifies the target stage with stage ID to store the stage log.

The ID of the PredefinedStage is the const value.

So if there are multiple predefined stages, piped refers the completed one.

ffjlabo commented 2 months ago

I tried to add suffix to the stageID for SCRIPTRUN_ROLLBACK stage like this. https://github.com/pipe-cd/pipecd/commit/7a475a687e9eed85cd105a542db4b663f8415f66

But it failed when rollback.

PipeCD

The error comes from finding the stage config with stageID on the executing stage. https://github.com/pipe-cd/pipecd/blob/7a475a687e9eed85cd105a542db4b663f8415f66/pkg/app/piped/controller/scheduler.go#L532-L547

ffjlabo commented 2 months ago

Currently, the SCRIPT_RUN_ROLLBACK stage is a predefined stage, and it is assumed that there are multiple in the pipeline. But we should modify the spec to execute only one SCRIPT_RUN_ROLLBACK because of the reason below.