runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.87k stars 1.06k forks source link

Atlantis is not properly running plan anymore after Gitlab upgrade to 15.11.13 #5131

Open c0debreaker opened 2 days ago

c0debreaker commented 2 days ago

Community Note


Overview of the Issue

We recently upgraded GitLab to version 15.11.13 and have encountered an issue where autoplan functionality for projects using Atlantis is failing. The GitLab webhook to /events is still being triggered, as confirmed by the Atlantis logs in AWS ECS.

Previously, when creating a merge request, autoplan would automatically trigger, and the Terraform plan output would display within the merge request.

However, after upgrading GitLab to version 15.11.13 last week, we've noticed that autoplan no longer functions as expected. I contacted GitLab support to determine if there were any updates to their webhook functionality. Their response was: "We didn't update the payload of our webhooks. Please reach out to the Atlantis community for assistance."

Interestingly, if I manually type atlantis plan in the merge request comments, it works correctly, and the output is displayed as expected. Additionally, if I update the merge request and push changes with git push, the autoplan also functions properly. The issue only occurs when the merge request is first created—this behavior was not present before the upgrade.

Today, I continued debugging. Our Atlantis instance runs in a container on AWS ECS. Please see below for the relevant Atlantis logs for further analysis as well as the attached log.

Logs

December 01, 2024 at 11:38 (UTC-6:00) | {"level":"info","ts":"2024-12-01T17:38:33.274Z","caller":"events/project_command_builder.go:238","msg":"successfully parsed atlantis.yaml file","json":{"repo":"terraform/dev-vpc","pull":"455"}} | atlantis
-- | -- | --
December 01, 2024 at 11:38 (UTC-6:00) | {"level":"info","ts":"2024-12-01T17:38:33.274Z","caller":"events/project_command_builder.go:243","msg":"0 projects are to be planned based on their when_modified config","json":{"repo":"terraform/dev-vpc","pull":"455"}} | atlantis
December 01, 2024 at 11:38 (UTC-6:00) | {"level":"info","ts":"2024-12-01T17:38:33.274Z","caller":"events/plan_command_runner.go:84","msg":"determined there was no project to run plan in","json":{"repo":"terraform/dev-vpc","pull":"455"}} | atlantis
December 01, 2024 at 11:38 (UTC-6:00) | {"level":"info","ts":"2024-12-01T17:38:32.513Z","caller":"events/working_dir.go:202","msg":"creating dir \"/home/atlantis/.atlantis/repos/terraform/dev-vpc/455/default\"","json":{"repo":"terraform/dev-vpc","pull":"455"}} | atlantis
December 01, 2024 at 11:38 (UTC-6:00) | {"level":"info","ts":"2024-12-01T17:38:32.196Z","caller":"events/events_controller.go:461","msg":"identified event as type \"opened\"","json":{}} | atlantis
December 01, 2024 at 11:38 (UTC-6:00) | {"level":"info","ts":"2024-12-01T17:38:32.196Z","caller":"events/events_controller.go:346","msg":"executing autoplan","json":{}} | atlantis

I've also attached more complete logs from Cloudwatch. log-events-viewer-result.csv

Environment details

Atlantis server-side config file: I couldn't find the server side config file

Repo atlantis.yaml file:

version: 2
automerge: true
projects:
  - name: dev-vpc
    dir: .
    workflow: cross-account
    autoplan:
      when_modified: ["**/*.tf", "**/*.json", "**/*.txt"]
workflows:
  cross-account:
    plan:
      steps:
        - init:
        - plan:
            extra_args: ["-var", "atlantis_assume_role_arn=arn:aws:iam::1234567890:role/terraform_atlantis_service_role"]

Additional Context

c0debreaker commented 1 day ago

After upgrading to 0.31.0, autoplan is working again. However, the IaC we use to deploy Atlantis to ECS got affected. Our Atlantis git project is using atlantis to do the terraform plan and apply. I'm aware this is a weird setup. I have no clue as to why our previous devops team did it this way when in Atlantis installation documents, several approaches were described. Anyways, when we submit a merge request, a webhook in Gitlab gets sent to our atlantis server in ECS. With version 0.31.0, we are now getting this error message pasted or display in the merge request:

 Error: failed to get shared config profile, build

I did some googling and most of them said to remove proflie in provider aws. I tried it but I still got the same message.

When I reverted back to Atlantis v0.17.4 in ECS, the plan worked again.

What could I be missing?