woodpecker-ci / woodpecker

Woodpecker is a simple, yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
4.17k stars 360 forks source link

Matrix and `when.evaluate` regression (not evaluating matrix variables) #2068

Closed ambroisie closed 11 months ago

ambroisie commented 1 year ago

Component

server

Describe the bug

I used to be able to use the following configuration:

matrix:
  include:
    - TYPE: dev
      MAKE_TARGET: build-dev
      SSH_TARGET: ssh_target_dev
    - TYPE: prod
      MAKE_TARGET: build-prod
      SSH_TARGET: ssh_target

# Run the correct matrix build on the correct branch
when:
  evaluate: |
    CI_PIPELINE_EVENT in ["push", "cron", "deployment", "manual"]
    and ((CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH) == ("${TYPE}" == "prod"))

To be able to have a workflow which has 95% overlap between prod and dev, but only runs prod on main and dev on other branches.

When updating to 1.0.0, I noticed that this configuration stopped working.

From the logs, I can see that the filter is evaluated once for the whole workflow, and ${TYPE} is empty/not evaluated. I would expect it to evaluate the filter once for each matrix variant instead, as it used to be ~6 months ago.

System Info

Version: 1.0.0
Run on a NixOS host.

Additional context

Evaluate:\"CI_PIPELINE_EVENT in [\\\"push\\\", \\\"cron\\\", \\\"deployment\\\", \\\"manual\\\"]\\nand ((CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH) == (\\\"\\\" == \\\"prod\\\"))\\n\"}}}

Validations

6543 commented 1 year ago

well It's a regression, but as per label definition: it got released and so it's a bug

qwerty287 commented 11 months ago

I tried to fix this, but I cannot reproduce it.

matrix:
  include: 
    - MATRIX_ENV: a
    - MATRIX_ENV: b

steps:
  - name: test
    image: bash
    commands:
        - echo $MATRIX_ENV

when:
  evaluate: '"${MATRIX_ENV}" == "a"'

works fine with the local backend, it is only executed for MATRIX_ENV=a.

ambroisie commented 11 months ago

Let me try again and report back.

ambroisie commented 11 months ago

I still have the same issue with the exact configuration I posted.

Nov 01 18:39:55 porthos woodpecker-server[245158]: {"level":"error","error":"Error #01: ignoring hook: global when filter of all workflows do skip this pipeline\n","ip":"104.132.45.72","latency":1091.193646,"method":"POST","path":"/api/repos/37/pipelines","status":500,"time":"2023-11-01T17:39:55Z","user-agent":"Mozilla/5.0 (X11; CrOS x86_64 14541.0.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36","time":"2023-11-01T18:39:55+01:00","caller":"github.com/woodpecker-ci/woodpecker/server/router/middleware/logger.go:43"}

Though it doesn't log the exact when.evaluate query.

ambroisie commented 11 months ago

Ah, just had to set the log level to trace.

So it still shows that ${TYPE} is substituted to an empty string :-(.

Evaluate:\"CI_PIPELINE_EVENT in [\\\"push\\\", \\\"cron\\\", \\\"deployment\\\", \\\"manual\\\"]\\nand ((CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH) == (\\\"\\\" == \\\"prod\\\"))\\n\"}}}

I wonder if the issue is the name of the variable (TYPE is quite generic)? Though it isn't prefixed by CI_ so it shouldn't be reserved by Woodpecker. If that is the case, I would argue that it should still work with such a generic variable name (as long as it isn't CI_* prefixed).

qwerty287 commented 11 months ago

I don't think that's the issue. Can you try using TYPE without interpolation, so just TYPE instead of "${TYPE}"?

ambroisie commented 11 months ago

Using just TYPE shows:

Evaluate:\"CI_PIPELINE_EVENT in [\\\"push\\\", \\\"cron\\\", \\\"deployment\\\", \\\"manual\\\"]\\nand ((CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH) == (TYPE == \\\"prod\\\"))\\n\"}}}

And

Nov 01 19:10:51 porthos woodpecker-server[1315242]: {"level":"debug","repo":"ambroisie/blog","time":"2023-11-01T19:10:51+01:00","caller":"github.com/woodpecker-ci/woodpecker/server/pipeline/create.go:70","message":"ignoring hook: global when filter of all workflows do skip this pipeline"}

EDIT: I also tried just (CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH) == (TYPE == "prod") which also failed.

qwerty287 commented 11 months ago

Can you add some print statements for debugging?

What I'd like to know: In https://github.com/woodpecker-ci/woodpecker/blob/main/pipeline/frontend/yaml/constraint/constraint.go#L190, print out env - make sure that there are no secrets or similar things in there when posting.

ambroisie commented 11 months ago

After removing all CI_* variables (if you saw my previous message, I deleted it because I had misread the logs):

map[CI:woodpecker  HTTPS_PROXY: HTTP_PROXY: MAKE_TARGET:build-prod NO_PROXY: SSH_TARGET:ssh_target TYPE:prod http_proxy: https_proxy: no_proxy:]

So TYPE does appear in the env and subsequent call to Compile... I'm not sure what the issue is here.

ambroisie commented 11 months ago

Ah wait, the previous comment was when run on the test branch.

But when I run it manually on main it doesn't have the same content:

map[CI:woodpecker CI_BUILD_CREATED:0 CI_BUILD_DEPLOY_TARGET: CI_BUILD_EVENT:manual CI_BUILD_FINISHED:0 CI_BUILD_LINK:https://git.belanyi.fr/ambroisie/blog CI_BUILD_NUMBER:0 CI_BUILD_PARENT:0 CI_BUILD_STARTED:0 CI_BUILD_STATUS: CI_COMMIT_AUTHOR:ambroisie CI_COMMIT_AUTHOR_AVATAR:https://git.belanyi.fr/avatars/29759761821fe477041d8dd54df22e3a CI_COMMIT_AUTHOR_EMAIL:bruno@belanyi.fr CI_COMMIT_BRANCH:main CI_COMMIT_LINK:https://git.belanyi.fr/ambroisie/blog CI_COMMIT_MESSAGE:MANUAL PIPELINE @ main CI_COMMIT_PULL_REQUEST: CI_COMMIT_PULL_REQUEST_LABELS: CI_COMMIT_REF:main CI_COMMIT_REFSPEC: CI_COMMIT_SHA:3286e92f594db0082457be93aa638249ea728daf CI_COMMIT_SOURCE_BRANCH: CI_COMMIT_TAG: CI_COMMIT_TARGET_BRANCH: CI_COMMIT_URL:https://git.belanyi.fr/ambroisie/blog CI_FORGE_TYPE:gitea CI_FORGE_URL:https://git.belanyi.fr CI_JOB_FINISHED: CI_JOB_NUMBER:0 CI_JOB_STARTED: CI_JOB_STATUS: CI_PIPELINE_CREATED:0 CI_PIPELINE_DEPLOY_TARGET: CI_PIPELINE_EVENT:manual CI_PIPELINE_FINISHED:0 CI_PIPELINE_LINK:https://git.belanyi.fr/ambroisie/blog CI_PIPELINE_NUMBER:0 CI_PIPELINE_PARENT:0 CI_PIPELINE_STARTED:0 CI_PIPELINE_STATUS: CI_PIPELINE_URL:https://git.belanyi.fr/ambroisie/blog CI_PREV_BUILD_CREATED:0 CI_PREV_BUILD_DEPLOY_TARGET: CI_PREV_BUILD_EVENT: CI_PREV_BUILD_FINISHED:0 CI_PREV_BUILD_LINK: CI_PREV_BUILD_NUMBER:0 CI_PREV_BUILD_PARENT:0 CI_PREV_BUILD_STARTED:0 CI_PREV_BUILD_STATUS: CI_PREV_COMMIT_AUTHOR: CI_PREV_COMMIT_AUTHOR_AVATAR: CI_PREV_COMMIT_AUTHOR_EMAIL: CI_PREV_COMMIT_BRANCH: CI_PREV_COMMIT_LINK: CI_PREV_COMMIT_MESSAGE: CI_PREV_COMMIT_REF: CI_PREV_COMMIT_REFSPEC: CI_PREV_COMMIT_SHA: CI_PREV_COMMIT_URL: CI_PREV_PIPELINE_CREATED:0 CI_PREV_PIPELINE_DEPLOY_TARGET: CI_PREV_PIPELINE_EVENT: CI_PREV_PIPELINE_FINISHED:0 CI_PREV_PIPELINE_LINK: CI_PREV_PIPELINE_NUMBER:0 CI_PREV_PIPELINE_PARENT:0 CI_PREV_PIPELINE_STARTED:0 CI_PREV_PIPELINE_STATUS: CI_PREV_PIPELINE_URL: CI_REPO:ambroisie/blog CI_REPO_CLONE_URL:https://git.belanyi.fr/ambroisie/blog.git CI_REPO_DEFAULT_BRANCH:main CI_REPO_LINK:https://git.belanyi.fr/ambroisie/blog CI_REPO_NAME:blog CI_REPO_OWNER:ambroisie CI_REPO_PRIVATE:false CI_REPO_REMOTE:https://git.belanyi.fr/ambroisie/blog.git CI_REPO_REMOTE_ID:5 CI_REPO_SCM:git CI_REPO_TRUSTED:false CI_REPO_URL:https://git.belanyi.fr/ambroisie/blog CI_STEP_FINISHED: CI_STEP_NAME: CI_STEP_NUMBER:0 CI_STEP_STARTED: CI_STEP_STATUS: CI_SYSTEM_ARCH: CI_SYSTEM_HOST: CI_SYSTEM_LINK: CI_SYSTEM_NAME:woodpecker CI_SYSTEM_PLATFORM: CI_SYSTEM_URL: CI_SYSTEM_VERSION:1.0.3 CI_WORKFLOW_NAME: CI_WORKFLOW_NUMBER:0]

I can see that it doesn't have any of the non-CI_* variables from the previous run :thinking:.

ambroisie commented 11 months ago

Another difference: when manually triggered on test I can see that the Printf is output multiple times (looks like 9? what a weird number).

Whereas when triggered on main I only see it once.


Because of the weird number in the test case (9 is odd, which I didn't expect, as the matrix has two "axes" I would expect an even number) I decided to take a closer look.

Looking over the log once again, the first invocation in the test case also does not have the TYPE and other variables from the matrix. But since it doesn't get filtered out (since TYPE == "prod" evaluates to false and CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH also) the pipeline can go on to be evaluated. This does not happen on the main trigger, as TYPE == "prod" is false but CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH is true.

qwerty287 commented 11 months ago

Another difference: when manually triggered on test I can see that the Printf is output multiple times (looks like 9? what a weird number).

I don't know why it's only once for main, but AFAIK woodpecker compiles the pipelines multiple times for different reasons (e.g. env var substitution).

Can you add another print and print environ in https://github.com/woodpecker-ci/woodpecker/blob/main/pipeline/stepBuilder.go#L125?

ambroisie commented 11 months ago

It doesn't log anything on main but does log on test (I removed the usual CI_* values):

map[CI: woodpecker MAKE_TARGET:build-dev SSH_TARGET:ssh_target_dev TYPE:dev]

The environ logging appears once per 2 "matrix-enabled" logging in constraint.go (the ones that do have TYPE etc...). No environ log for the very first constraint.go (which only has CI_* values).

qwerty287 commented 11 months ago

Ah, I think I just got an idea what's going wrong here (and it also explains why I can't reproduce).

On 1.0 versions, YAMLs were parsed multiple time, and one of these parsings had the task to determine whether there are any workflows to run, it checks the global when filters. However, this did not parse the matrix - so TYPE is empty and TYPE == "prod" always false. It works on test as you wrote because the other condition is false too.

I refactored this whole parsing process in next though to only parse the pipelines once (https://github.com/woodpecker-ci/woodpecker/pull/2527), and as a side-effect (which I didn't know that it fixes this issue when I wrote that refactoring), it also respects the matrix now when evaluating global whens when checking whether there are any workflows to run.

Can you check your configuration on next? (This will run DB migrations so rollback is not possible.) You can also wait, we're going to release the next release in a few days which will contain this.

qwerty287 commented 11 months ago

I'm going to close this for now, if you test it with next and it still does not work just comment.

This is the same underlying issue as #2002 and this is fixed in next.

ambroisie commented 11 months ago

Ah, good to know it's fixed :-). I'll try it when I update the package to the next version and report back if there are issues.

I would just suggest if possible to add a test to ensure this won't regress in the future? Since there's no Jsonnet/Starlark configuration (at least without first writing a plug-in etc...) this is probably the easiest way I know to define multiple workflows that have very similar, but slighly different configurations.

qwerty287 commented 11 months ago

At all, this is pretty complex to test, but it could be covered by something like https://github.com/woodpecker-ci/woodpecker/issues/6

ambroisie commented 11 months ago

I've been wanting to create a NixOS test for Woodpecker, but that requires some non-trivial work to setup the OAuth application in Gitea (there's no CLI for it that I know of).

I wonder if we could converge on something here. If I ever shave that yak and get a test working it could serve as a basis for your end-to-end testing initiative (NixOS tests are very versatile).

qwerty287 commented 11 months ago

setup the OAuth application in Gitea (there's no CLI for it that I know of).

There is - take a look at our gitpod setup which does something like this: https://github.com/woodpecker-ci/woodpecker/blob/main/.gitpod.yml

ambroisie commented 11 months ago

Yes sorry I meant no tea command for it.

It does have a REST API so curl is always an option and that was what I had in mind. Thank you for pointing me towards your gitpod setup, that will make it much easier to just copy-paste it! :-)