kyma-project / test-infra

Test infrastructure for the Kyma project.
https://status.build.kyma-project.io/
Apache License 2.0
38 stars 182 forks source link

Research and POC using required checks in gh workflows #12139

Open dekiel opened 1 week ago

dekiel commented 1 week ago

The required checks in github actions works different than required prowjobs in prow. Because we are moving out of Prow we must understand consequences and decide on our approach.

dekiel commented 1 week ago

Description of github workflows behaviour when set as required check. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/troubleshooting-required-status-checks#handling-skipped-but-required-checks

  1. Use Github action to implement path filtering on a job or step level with required checks defined in GitHub. The workflows are set to run for all pull requests. This prevents required checks stuck in pending state. The workflows are marked as required in GitHub. This guarantee all expected workflows must run and report status. The workflow uses a job with github action to evaluate if changed files match a path filter. This let us do a real job conditionally. Example github actions.

    Other jobs in a workflow must use needs to depend on a job evaluating path filters and if condition to decide if the real work should be done. If the if condition doesn't resolve to true, the job statuses will be skipped which effectively will not block pr. Pros:

    • We can use GitHub required checks. This guarantee we must get a result for them.
    • No need for additional software developed in house.
    • No need to host additional service.
    • The workflow definition is self contained. No external or additional source of information about workflow configuration. However required checks are defined in GitHub.

    Cons:

    • We will run all workflows for all PR and push events. Part will finish after path filter evaluation but agents must be allocated what will consume a minutes.
    • The list of checks on PR might be super long.
    • We will define path filtering in a custom way. This requires additional github action for path filtering and potentially gh action for failing in case problems. However this can be implemented as a reusable workflow once and later reused with one simple step.
    • Maintaining a list of required checks must be automated to prevent an unintentional workflows that are not marked as required.
    • This require a custom solution to mark a workflow as optional. However this can be as simple as dedicated custom comment in a workflow file.
  2. We can set workflows with paths filtering but not as required and use external software or separate required workflow to control status of executed workflows and block merging if any failed. This is the same approach as using Tide in Prow. This allows blocking merge if unknown check appear on a PR, but will not guarantee that all expected checks are present. This is approach used by telemetry-manager module. The workflows are set with path or branch filtering. This let us run only workflows that has something to do for introduced changes. The workflows are not marked as required. This prevents stuck required checks in pending state. We defined one required check which runs always. This check is running a gh action which guarantee all checks running for an event are skipped or success. Example GitHub action https://github.com/wechuli/allcheckspassed.

    Pros:

    • We can use a native path filtering feature. No need to use dedicated gh action.
    • No need for additional software developed in house.
    • No need to host additional service.
    • The workflow definition is self contained. No external or additional source of information about workflow configuration.

    Cons:

    • All checks are considered a required.
    • If expected check will not run for PR this will not be detected and PR merge blocked. This can be solved by adding another process analysing if all expected checks reported its status.
    • Additional workflow must run for required check to report if all statuses are success.
  3. We can set workflows with paths filtering and as required and use external software or separate required workflow to set pending checks to skipped. The workflows are set with path or branch filtering. This let us run only workflows that has something to do for introduced changes. The workflows are marked as required in GitHub. This guarantee all expected workflows must run and report status. External service or additional workflow must be executed to analyse which required checks should run. The required checks which should not be executed are set as skipped by this external process.

    Pros:

    • We can use GitHub required checks. This guarantee we must get a result for them.
    • The workflow definition is self contained. No external or additional source of information about workflow configuration.
    • We can use a native path filtering feature. No need to use dedicated gh action.
    • The external service or additional workflow can detect if unknown check is present for event.

    Cons:

    • Additional workflow or external service must run to set skipped statuses where needed.
    • This require a custom solution to mark a workflow as optional. However this can be as simple as dedicated custom comment in a workflow file.

Prevent skip job when parent job failed

Docs about required checks advice to use always() status check function together with needs. This is the first dependent-job which got executed even parent job report failed. There is no simple way to fail depending job when parent failed.

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  failed-job:
    runs-on: ubuntu-latest
    outputs:
      output1: ${{ steps.step1.outputs.test }}
    steps:
      - name: failed-step-with-output
        run: |
          echo "This job will always fail"
          echo "test=hello" >> "$GITHUB_OUTPUT"
          exit 1

  dependent-job:
    runs-on: ubuntu-latest
    needs: failed-job
    if: ${{ always() }}
    steps:
      - name: dependent-step
        run: echo "This job is dependent on failed-job"

  dependent-job-2:
    runs-on: ubuntu-latest
    needs: failed-job
    if: ${{ always() }}
    steps:
      - name: dependent-step
        if: ${{ needs.failed-job.outputs.output1 == 'hello' }}
        run: echo "This job is dependent on failed-job"

  dependent-job-3:
    runs-on: ubuntu-latest
    needs: failed-job
    if: ${{ always() && needs.failed-job.outputs.output1 == 'hello' }}
    steps:
      - name: dependent-step
        run: echo "This job is dependent on failed-job"

  dependent-job-4:
    runs-on: ubuntu-latest
    needs: failed-job
    if: ${{ (always() && needs.failed-job.outputs.output1 == 'hello') || failure() }}
    steps:
      - name: dependent-step
        if: ${{ failure() }}
        run: |
            echo "This job should fail"
            exit 1
      - name: this-not-fail
        run: echo "do not fail"
image

Conclusion

The most secure is option 3, however it requires most complex additional process interacting with checks statuses. The option 1 offer high security level too without requirement for external service or additional workflow. However it changes the interface for user the most, introducing highly customised way for workflow configuration. Option 2 is the simplest one, with lowest secure level.

dekiel commented 6 days ago

A github action we could use to solve the problem. https://github.com/wechuli/allcheckspassed It's already used in telemetry-manager repository.