runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.52k stars 1.02k forks source link

Gitlab: duplicate pipelines in gitlab merge requests #3373

Open paulozava opened 1 year ago

paulozava commented 1 year ago

Community Note


Overview of the Issue

We updated our Atlantis from v0.20.1 to v0.23.5 and since then we detect that the Atlantis steps are running in a different pipeline than the project, so we now have double pipelines.

Reproduction Steps

Logs

Environment details

Atlantis server-side config file:

atlantis server --gitlab-hostname="$git_hostname" \
                         --gitlab-token="$git_token" \
                         --gitlab-user="$git_username" \
                         --gitlab-webhook-secret="$git_webhook" \
                         --checkout-strategy='merge' \
                         --enable-diff-markdown-format \
                         --repo-whitelist='*' \
                         --autoplan-file-list='**/*.tf,**/*.tfvars,**/*.hcl,*.tf,*.tfvars,*.hcl' \
                         --repo-config="/atlantis/repos.yaml"
repos:
- id: /.*/
  allowed_overrides: [apply_requirements, workflow]
  allow_custom_workflows: true
  pre_workflow_hooks:
    - run: python3 /scripts/inject_policy_checker.py
- id: /.*comp\/realestate.*/
  allowed_overrides: [apply_requirements, workflow]
  allow_custom_workflows: true
  pre_workflow_hooks:
    - run: python3 /scripts/inject_policy_checker.py
    - run: python3 /scripts/inject_infracost.py
  post_workflow_hooks:
    - run: |
        if [ ! -d "/tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM" ]; then
          exit 0
        fi

        infracost comment gitlab --repo $BASE_REPO_OWNER/$BASE_REPO_NAME \
                                 --merge-request $PULL_NUM \
                                 --path /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/'*'-infracost.json \
                                 --gitlab-server-url https://git.example.com \
                                 --gitlab-token $ATLANTIS_GITLAB_TOKEN \
                                 --behavior new

        rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
- id: /.*infrastructure\/vv\/.*/
  allowed_overrides: [apply_requirements, workflow]
  allow_custom_workflows: true
  pre_workflow_hooks:
    - run: python3 /scripts/inject_policy_checker.py
    - run: python3 /scripts/inject_infracost.py
  post_workflow_hooks:
    - run: |
        if [ ! -d "/tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM" ]; then
          exit 0
        fi

        infracost comment gitlab --repo $BASE_REPO_OWNER/$BASE_REPO_NAME \
                                 --merge-request $PULL_NUM \
                                 --path /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/'*'-infracost.json \
                                 --gitlab-server-url https://git.example.com \
                                 --gitlab-token $ATLANTIS_GITLAB_TOKEN \
                                 --behavior new

        rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
      description: Sending Gitlab comment with the table
- id: /.*comp\/motors\/.*/
  allowed_overrides: [apply_requirements, workflow]
  allow_custom_workflows: true
  pre_workflow_hooks:
    - run: python3 /scripts/inject_policy_checker.py
    - run: python3 /scripts/inject_infracost.py
  post_workflow_hooks:
    - run: |
        if [ ! -d "/tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM" ]; then
          exit 0
        fi

        infracost comment gitlab --repo $BASE_REPO_OWNER/$BASE_REPO_NAME \
                                 --merge-request $PULL_NUM \
                                 --path /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/'*'-infracost.json \
                                 --gitlab-server-url https://git.example.com \
                                 --gitlab-token $ATLANTIS_GITLAB_TOKEN \
                                 --behavior new

        rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM

Repo atlantis.yaml file:

version: 3
automerge: false
parallel_plan: true
parallel_apply: true
projects:
  - name: production
    dir: infrastructure
    terraform_version: v1.0.7
    workflow: production
    workspace: production
    apply_requirements:
      - approved
    autoplan:
      when_modified: [../**/*]
  - name: staging
    dir: infrastructure
    terraform_version: v1.0.7
    workflow: staging
    workspace: staging
    apply_requirements:
      - approved
    autoplan:
      when_modified: [../**/*]
  - name: terragrunt-example
    dir: infrastructure-terragrunt
    workflow: terragrunt
    terraform_version: v1.0.7
    apply_requirements:
      - approved
    autoplan:
      when_modified: [../**/*]
workflows:
  production:
    plan:
      steps:
        - run: rm -rf .terraform*
        - init:
            extra_args:
              - -backend-config=../backends/production.backend.tfvars
        - plan:
            extra_args:
              - -compact-warnings
              - -var 'region=us-west-1'
              - -var 'account=xxx'
              - -var 'role_name=atlantis'
    apply:
      steps:
        - apply
  staging:
    plan:
      steps:
        - run: rm -rf .terraform*
        - init:
            extra_args:
              - -backend-config=../backends/staging.backend.tfvars
        - plan:
            extra_args:
              - -compact-warnings
              - -var 'region=us-west-1'
              - -var 'account=xxx'
              - -var 'role_name=atlantis'
    apply:
      steps:
        - apply
  terragrunt:
    plan:
      steps:
        - env:
            name: TF_IN_AUTOMATION
            value: true
        - env:
            name: TERRAGRUNT_IAM_ROLE
            value: arn:aws:iam::xxx:role/atlantis
        - run: terragrunt init --terragrunt-source-update --terragrunt-non-interactive
        - run: terragrunt plan -out=$PLANFILE
    apply:
      steps:
        - env:
            name: TF_IN_AUTOMATION
            value: true
        - env:
            name: TERRAGRUNT_IAM_ROLE
            value: arn:aws:iam::xxx:role/atlantis
        - run: terragrunt apply $PLANFILE
jamengual commented 1 year ago

did you check the other issues? there was a PR that fixed this a while back.

paulozava commented 1 year ago

I didn't find anything similar open, but found this PR https://github.com/runatlantis/atlantis/pull/2745

paulozava commented 1 year ago

It is the same behaviour reported here https://github.com/runatlantis/atlantis/issues/2743

jamengual commented 1 year ago

@jukie do you have any ideas?

michelmzs commented 1 year ago

@paulozava Please, could you confirm the GitLab version are you using (gitlab.com or self-hosted)? And the .gitlab-ci.yaml config too.

paulozava commented 1 year ago

Sure, we are using GitLab self-hosted at version 15.9.4-ee

The .gitlab-ci.yaml:

image: registry.corp.com/infrastructure/verticals/linters:latest

stages:
  - Lint
  - Deploy

YAML lint:
  stage: Lint
  script:
    - find . -name "*.yml" -o -name "*.yaml" | xargs -n 1 yamllint -d relaxed --no-warnings
  only:
    - merge_requests
  tags: [k8s-motors]

Kustomize Lint:
  stage: Lint
  variables:
    K8S_INFRASTRUCTURE_FILES_FOLDER: k8s-clusterconfig
  script:
    - kustomize build $K8S_INFRASTRUCTURE_FILES_FOLDER/staging > /dev/null
    - kustomize build $K8S_INFRASTRUCTURE_FILES_FOLDER/production > /dev/null
    - kustomize build argocd-apps/production > /dev/null
    - kustomize build argocd-apps/staging > /dev/null
  only:
    - merge_requests
  tags: [k8s-motors]
rjmsilveira commented 1 year ago

Hi everyone,

Meanwhile I was able to perform some tests and identify why we get duplicate pipelines and I believe the problem may be the way the implementation of https://github.com/runatlantis/atlantis/pull/2745 was done.

Let me try to explain.

If you have a project which contains workflows to create MR pipelines, the MR 2745 works just fine and Atlantis pipeline will appear in the same MR pipeline (as expected)

If you enable in gitlab MR options the setting "Enable merged results pipelines" and then push a commit, the MR pipeline will run in a different pipeline than the Atlantis which will now run a branch pipeline. This in reality is not a big deal. The problem happens when you revert the setting above

When you revert it, you would expect the normal behavior (both Atlantis and the project pipeline in the same line) but in reality what happens is that the behavior above is still kept (2 separate pipelines)

From what I could understand the issue is with this bit of code which gets the head pipeline from the MR

mr, err := g.GetMergeRequest(pull.BaseRepo.FullName, pull.Num)
if err != nil {
    return err
}

The issue is that the api call above will return the Atlantis pipeline as being the head pipeline and mr.Pipeline.Source will be external

From that moment on, unless you trigger a new pipeline for the project, the last pipeline will always be the one from Atlantis which is identified as external and the pipelines will never "merge"

I was also able to test that if the function is just

func (g *GitlabClient) UpdateStatus(repo models.Repo, pull models.PullRequest, state models.CommitStatus, src string, description string, url string) error {
    gitlabState := gitlab.Pending
    switch state {
    case models.PendingCommitStatus:
        gitlabState = gitlab.Running
    case models.FailedCommitStatus:
        gitlabState = gitlab.Failed
    case models.SuccessCommitStatus:
        gitlabState = gitlab.Success
    }

    options := &gitlab.SetCommitStatusOptions{
        State:       gitlabState,
        Context:     gitlab.String(src),
        Description: gitlab.String(description),
        TargetURL:   &url,
    }

    _, _, err := g.Client.Commits.SetCommitStatus(repo.FullName, pull.HeadCommit, options)

    return err
}

Deliberately not passing refTarget seems to work for all cases and I guess Gitlab will always try to see what use-case it fits best? :man_shrugging:

I don't have any use-case or example of when an event is anything else than a merge_request_event to test and would be cool if someone could present some examples of the pipelines that originally were being duplicated and that got solved with the MR above.

I also noticed that (at least for Gitlab) the supported Webhook events are Comments and Merge Request events and I am wondering if all the logic from the MR is ever going to be needed.

Thank you in advance for your time and help, and happy to propose an MR with any sort of fix but just want to better understand first the motivation for the code above and how we can make it work for all use-cases :smiley:

jseiser commented 11 months ago

We self host 15.11.2-ee and have this same problem.

rjmsilveira commented 11 months ago

Pinging @michelmzs (I should've done it sooner) as you were the author of the initial MR

WarpRat commented 8 months ago

Is anyone currently working on this? This has been a pain point for us for a while. I'd be willing to take a shot at implementing a fix if nobody else has work in flight.

jamengual commented 8 months ago

I thought this was fixed already, have you tested 0.26.0?

On Wed, Oct 18, 2023, 5:37 p.m. WarpRat @.***> wrote:

Is anyone currently working on this? This has been a pain point for us for a while. I'd be willing to take a shot at implementing a fix if nobody else has work in flight.

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/3373#issuecomment-1769710310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERHYRL7I5DTVSBFHDYTYABY55AVCNFSM6AAAAAAXUP3FNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZG4YTAMZRGA . You are receiving this because you commented.Message ID: @.***>

WarpRat commented 8 months ago

Yes, we're still seeing this issue on 0.26.0 with merged results pipelines enabled. Based on the description in this comment I'm not sure if it's even possible to have converged merged results pipelines and external status pipelines which would be ideal. However, we still see the behavior described in that comment where even disabling merged results pipelines still results in a merge request pipeline with our jobs and a separate branch pipeline with the Altantis jobs. If we don't use merge request pipelines at all then everything works as expected in the branch pipeline. If we have to stick to branch pipelines only we can do that but it sounds like it would be a minor fix to get things at least working in merge request pipelines again.

We're on Gitlab SaaS which is currently 16.5.

michelmzs commented 8 months ago

Hi everyone,

Meanwhile I was able to perform some tests and identify why we get duplicate pipelines and I believe the problem may be the way the implementation of #2745 was done.

Let me try to explain.

If you have a project which contains workflows to create MR pipelines, the MR 2745 works just fine and Atlantis pipeline will appear in the same MR pipeline (as expected)

If you enable in gitlab MR options the setting "Enable merged results pipelines" and then push a commit, the MR pipeline will run in a different pipeline than the Atlantis which will now run a branch pipeline. This in reality is not a big deal. The problem happens when you revert the setting above

When you revert it, you would expect the normal behavior (both Atlantis and the project pipeline in the same line) but in reality what happens is that the behavior above is still kept (2 separate pipelines)

From what I could understand the issue is with this bit of code which gets the head pipeline from the MR

mr, err := g.GetMergeRequest(pull.BaseRepo.FullName, pull.Num)
if err != nil {
  return err
}

The issue is that the api call above will return the Atlantis pipeline as being the head pipeline and mr.Pipeline.Source will be external

From that moment on, unless you trigger a new pipeline for the project, the last pipeline will always be the one from Atlantis which is identified as external and the pipelines will never "merge"

I was also able to test that if the function is just

func (g *GitlabClient) UpdateStatus(repo models.Repo, pull models.PullRequest, state models.CommitStatus, src string, description string, url string) error {
  gitlabState := gitlab.Pending
  switch state {
  case models.PendingCommitStatus:
      gitlabState = gitlab.Running
  case models.FailedCommitStatus:
      gitlabState = gitlab.Failed
  case models.SuccessCommitStatus:
      gitlabState = gitlab.Success
  }

  options := &gitlab.SetCommitStatusOptions{
      State:       gitlabState,
      Context:     gitlab.String(src),
      Description: gitlab.String(description),
      TargetURL:   &url,
  }

  _, _, err := g.Client.Commits.SetCommitStatus(repo.FullName, pull.HeadCommit, options)

  return err
}

Deliberately not passing refTarget seems to work for all cases and I guess Gitlab will always try to see what use-case it fits best? 🤷‍♂️

I don't have any use-case or example of when an event is anything else than a merge_request_event to test and would be cool if someone could present some examples of the pipelines that originally were being duplicated and that got solved with the MR above.

I also noticed that (at least for Gitlab) the supported Webhook events are Comments and Merge Request events and I am wondering if all the logic from the MR is ever going to be needed.

Thank you in advance for your time and help, and happy to propose an MR with any sort of fix but just want to better understand first the motivation for the code above and how we can make it work for all use-cases 😃

Sorry for the very late response, guys.

Removing the refTarget from setCommitStatus will work for Merge Result pipelines, but the behavior described in https://github.com/runatlantis/atlantis/issues/2484 will occur again. Without the ref from head_pipeline, when a user checkout from a branch, the CommitStatus will be issued to the HEAD branch.

Also, I think that the duplicated pipeline problem not solved by the mentioned PRs is the race condition of GitLab CI Pipeline creation x Atlantis Webhook process, sometimes Atlantis will be faster than GitLab Pipeline creation, so the CommitStatus will be set in a different Pipeline ID. There is a discussion about it in https://github.com/runatlantis/atlantis/pull/3887

jamengual commented 8 months ago

@X-Guardian @lukemassa

rjmsilveira commented 1 week ago

@michelmzs thank you very much for helping understand the issue

We will do some tests and hopefully we will be able to overcome the issue. We also had the oportunity to update our Atlantis to one of the latest versions and try it out

Thanks once again!

jseiser commented 1 week ago

I assume this is related to our issue: https://github.com/runatlantis/atlantis/issues/4615