runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.68k stars 1.05k forks source link

Stuck failed check if a non-existent directory or project is given #2557

Open nitrocode opened 1 year ago

nitrocode commented 1 year ago

Community Note


Overview of the Issue

A permanent failed check is shown if a non-existent directory or project is given to atlantis plan.

atlantis plan -d global/testing123
atlantis plan -p testing123

Even though seeing the red check marks is annoying, since only the atlantis/plan is required, the new failed status checks do not prevent merging of the PR.

Instead, it would be nice to do one of the following

  1. Allow clearing all status checks when either running atlantis unlock OR closing the PR and reopening the PR. I tried this and it did not work.
  2. When a non-existent dir/project is given, update only the atlantis/plan context status (instead of atlantis/plan: <dir/project given> so it can be easily updated by running atlantis plan without -d or -p
  3. Allow subsequent atlantis plans (without -d or -p) to set the status for any pr checks that failed to be planned due to a missing dir/project
  4. Better option ? A mixture of the above ?

Status

✗ gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/{owner}/{repo}/commits/{ref}/statuses | \
  jq '.[] | select(.state=="failure")'
{
  "url": "https://api.github.com/repos/{owner}/{repo}/statuses/{ref}",
  "avatar_url": "https://avatars.githubusercontent.com/u/<redacted>?v=4",
  "id": "<redacted>",
  "node_id": "<redacted>",
  "state": "failure",
  "description": "Plan failed.",
  "target_url": "http://<redacted>.com/jobs/<redacted>",
  "context": "atlantis/plan: <redacted>",
  "created_at": "2022-10-05T23:58:50Z",
  "updated_at": "2022-10-05T23:58:50Z",
}

Workaround is to update the status for now

gh api \
  --method POST \
  -H "Accept: application/vnd.github+json" \
  /repos/{owner}/{repo}/commits/{ref}/statuses \
  -f state='success' \
  -f target_url="http://<redacted>.com/jobs/<redacted>" \
  -f description="skipped" \
  -f context="atlantis/plan: <redacted>"

Reproduction Steps

  1. Do not create a directory called global/testing123
  2. Do not create a project called testing123
  3. Run the commands above to show a failed check

Logs

N/A

Environment details

N/A

Additional Context

N/A

nicolst commented 10 months ago

We are having a problem where someone might try planning either a non-existent directory, or a directory with no Terraform configuration. For context we are using apply_requirements: [mergeable, approved] with --gh-allow-mergeable-bypass-apply, and the following GitHub branch protection rules:

Running for a non-existent directory fails with the comment dir "non-existent-dir" does not exist Running for an existent directory without Terraform fails with Error: No configuration files (truncated error message)

Both cases add a failed atlantis/plan: my-directory/default check to the PR. Running a generic atlantis plan or planning a valid project/directory adds a successful overall atlantis/plan status, but the failed check persists. The only practical way to get rid of it is to push a new commit to the PR branch.

This failed check prevents Atlantis from running apply on any other projects or directories as it does not consider the PR mergeable: Apply Failed: Pull request must be mergeable before running apply.. The GitHub API reports the PR as being mergable:

  "mergeable": true,
  "rebaseable": true,
  "mergeable_state": "blocked",

So I assume Atlantis has another check where any failed atlantis/plan statuses renders the PR unmergeable/unappliable.

Attempting to plan a non-existent project does not add a failed check for that project, only a failed generic atlantis/plan and comments the error no project with name "my-project" is defined in atlantis.yaml.

It seems like your second solution is already implemented for projects, so I feel like it would make sense to do this for directories as well. I am unsure about the third solution as this could allow invalid/failing Terraform to be merged by just planning another valid project in the repo?

lukemassa commented 10 months ago

I've looked into the code a bit, and one issue with solution 2 is that before the plan is even run, we update the "atlantis plan: missingdirectory" status with "Plan in progress...". It's only after this that we learn that the directory doesn't exist. https://github.com/runatlantis/atlantis/blob/main/server/events/project_command_runner.go#L179

The reason this is possible for project is that the project missing is determined while "building" the command: https://github.com/runatlantis/atlantis/blob/main/server/events/plan_command_runner.go#L194

I agree with @nicolst about issues with solution 3, also because by the time we rerun we don't know that the plan failed because of missing directory (other than looking at the description string, which seems a bit brittle).

Here's my summary what I think our current options are:

  1. Clearing all statuses on unlock 2a. We continue to initially comment "Plan in progress" on PRs, then if we find out that that directory turned out to not exist we delete the status 2b. Same as above, but we set the plan to "success"

My though process for 2a is that, it's then functionally equivalent to not having a check for project (as is the case now). It's a bit awkward though since we have to delete a status (I'm not sure that's even implemented for VCSs?)

2b seems weird at the outset, but you could make the argument that "planning nonexistentdir" is "done" (since there's nothing to do). We'd just have to make sure apply doesn't try to do anything with it.