Atlantis apply all after a failed apply; outputs Ran Apply for 0 projects

mlehner616 commented 5 years ago

I have a repo that uses the default workspace but there are a number of different project folders.

Atlantis version: 0.8.3 Terraform version: v0.12.8

version: 3
projects:
  - name: qa
    dir: qa_acct/qa_env
    terraform_version: v0.12.8
    autoplan:
      when_modified: ["../../projects/*", "*.tf*", "../../modules/*"]
      enabled: false
  - name: staging
    dir: prod_acct/staging_env
    terraform_version: v0.12.8
    autoplan:
      when_modified: ["../../projects/*", "*.tf*", "../../modules/*"]
      enabled: false
  - name: prod
    dir: prod_acct/prod_env
    terraform_version: v0.12.8
    autoplan:
      when_modified: ["../../projects/*", "*.tf*", "../../modules/*"]
      enabled: false

Plans are generated for all three projects as normal after commenting exactly atlantis plan. Immediately afterword, commenting atlantis apply attempts to apply all three environments as expected. In this case, there was an apply error due to an AWS IAM policy being misconfigured and the plans were not successfully applied. A commit was pushed to fix this issue and another atlantis apply was submitted. Note, there was not another atlantis plan after the fix commit was pushed. Atlantis behaved as if it had forgotten about the failed plans and assumed they had been applied successfully when, in fact, they had not been. I believe the expected behavior should be to reject the apply since new commits were made and force another plan be run, correct?

The result was the following:

Ran Apply for 0 projects:

Automatically merging because all plans have been successfully applied.

Locks and plans deleted for the projects and workspaces modified in this pull request:

* dir: `prod_acct/prod_env` workspace: `default`
* dir: `prod_acct/staging_env` workspace: `default`
* dir: `qa_acct/qa_env` workspace: `default`

lkysow commented 5 years ago

Yeah it's a bug. If autoplan had been enabled then there would have been new plans generated and the apply wouldn't have worked.

mlehner616 commented 5 years ago

@lkysow Thanks for the confirmation. This bug is killing us right now. We want people to be able see non-locking plans being run (in our normal CI pipleine) before Approvals are submitted so they can actually validate their code before blocking other development. If we wanted to dig into solving this, where would be a good place to start looking? I took a really quick glance through the repo and nothing jumped out at me.

Thank you for building this tool by the way, I really appreciate the work that went into this.

lkysow commented 5 years ago

After re-reading the ticket, this isn't technically a bug (although for your use-case it may as well be). Atlantis is just doing what you told it, it's up to you to run atlantis plan if you've pushed a new commit and don't have autoplan running. Coupled with automerge is the real issue here. If you didn't have automerge you'd quickly realize that you didn't re-run plan and there wouldn't be an issue.

Also if you were running with the -d or -p flags you'd get an error that "the plan doesn't exist for that project, please run plan". When we added the apply-all command (i.e. atlantis apply) we didn't replicate the behaviour. I'm not sure if it ever makes sense to not give an error in this case but I'd at least like to add a flag that lets you keep the old behaviour in case people were relying on it.

If we were to add some functionality to detect this case, it would be here: https://github.com/runatlantis/atlantis/blob/master/server/events/project_command_builder.go#L204 after Atlantis has found no pending plans. It could then exit with an error in this case.

I think a path forward may be:

new flag --allow-no-plan-apply which defaults to false now (breaking change)
thread that flag through and then check it at the line above

mlehner616 commented 5 years ago

Well, I actually think your original interpretation made sense to me. To clarify, we would never want atlantis to apply without having the most up to date plan saved and locked.

What we’re doing instead is just running validate, fmt, lint andterraform plan --lock=false in vanilla gitlab CI. Devs open an MR and need to fix any issues there, and get all approvals first, before the atlantis plan. The problem we were solving by doing it this way was autoplan opening locks too early in the process and thus blocking other MRs that were ready to be applied.

I still think this is a bug. Yes I wanted autoplan disabled but that just means I want the developers to run it if and only if all the pre-apply requirements are met. I would expect the apply step to run the same validation that the plan is locked and up to date and apply based on that. Turning off autoplan shouldn’t affect those checks. What seems to be happening with autoplan disabled is the apply is ignoring the plans and ultimately just applies nothing.

I can confirm there are plans and locks are created when they are supposed to be. It appears that the atlantis apply step is just ignoring those if a second apply is run after this first one fails. Expected behavior would be for the apply step to either force a replan if the MR was updated, or attempt to re-apply the original plan. It’s doing neither of these right now.

mlehner616 commented 5 years ago

One thing I did notice was that if the apply does fail, the saved plans are deleted but the locks are left open (this may be the actual bug here). If we removed those after a failed apply, that would basically force the plan step. I don’t know if that’s the best solution but I think it would work.

kipkoan commented 5 years ago

it's up to you to run atlantis plan if you've pushed a new commit and don't have autoplan running.

Our team has autoplan on, but pushing a new commit doesn't cause Atlantis to redo the plan (because Bitbucket).

One thing I did notice was that if the apply does fail, the saved plans are deleted but the locks are left open (this may be the actual bug here). If we removed those after a failed apply, that would basically force the plan step. I don’t know if that’s the best solution but I think it would work.

I agree we should either have Atlantis not delete the plans, or error if an apply is attempted without any plans.

@lkysow - what's the reason for Atlantis to delete the plans after a failed apply? It could have failed because a transient provider issue, and re-running apply on the same plan would later succeed.

ishallbethat commented 3 years ago

HI everyone I met this issue too. Any work in progress to fix this bug ? I removed the locks as mentioned above and redo "atlantis plan". It still shows "Ran Plan for 0 projects:"

jamengual commented 3 years ago

Running plan on the same PR after a failed to apply should not be any different than if atlantis does not delete the plan, it is just an extra step.

But if someone else in another PR modify the environment you are running plan against you will have a problem no matter what but by re-running a plan you could actually find the drift.

I do not think this is a bug, it is a bit annoying to run plan again but since terraform is idempotent it should only apply the difference.

evanstachowiak commented 2 years ago

I can run atlantis plan again and I am still getting the output "Ran Plan for 0 projects:"

If I run with atlantis plan -p *-production it will apply.

jamengual commented 2 years ago

with autoplan, you need to define every directory you want autoplan on/off in your atlantis.yaml otherwise it does not work, is what you guys are doing?

if this was a bug, no one will be using atlantis so I want to make sure if this is specific to multi-dir structure etc. For that, we need to see the altlantis.yaml files and dir structure so we can have a better idea.

This could be as simple as better documentation of autoplan with some examples.

evanstachowiak commented 2 years ago

@jamengual I am using an atlantis.yaml that was previously working. I think around v0.19.* this started breaking. It is about 50 projects, each with its own project name so that the -p wildcard flag can be used. The pattern for the naming is ${service_name}-${environment}.

I discovered that if i run atlantis apply -p *-environment, then the command will run, but it will run for ALL projects, regardless of what files have changed.

I have autoplan on, but if I run atlantis plan manually, it doesn't seem to make a difference.

Also of note, I am using custom workflows, not sure if that makes a difference.

jamengual commented 2 years ago

@evanstachowiak Please test with the pre-release image, we did some bug fixes there and I wonder if that could be the issue:

docker pull ghcr.io/runatlantis/atlantis:v0.19.3-pre.20220408

jamengual commented 2 years ago

is this still an issue with v0.19.8?

GusAntoniassi commented 2 years ago

Hello @jamengual I was able to reproduce this issue on v0.19.8, using the testdrive repository.

It only happened when using pre workflow hooks, such as the following:

---
repos:
  - id: /.*/
    pre_workflow_hooks:
    - run: echo "hello world"

The server logs for the execution:

{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"server/server.go:869","msg":"Atlantis started - listening on port 4141","json":{}}
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"scheduled/executor_service.go:46","msg":"Scheduled Executor Service started","json":{}}
{"level":"info","ts":"2022-09-22T13:59:09.305-0300","caller":"events/events_controller.go:533","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=dfb30ec0-3a97-11ed-9f80-6ecf217e25c6"}}
{"level":"info","ts":"2022-09-22T13:59:14.712-0300","caller":"events/working_dir.go:225","msg":"creating dir \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
{"level":"info","ts":"2022-09-22T13:59:15.360-0300","caller":"runtime/pre_workflow_hook_runner.go:50","msg":"successfully ran \"echo \\\"hello world\\\"\" in \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}

evanstachowiak commented 2 years ago

yes it's still an issue @jamengual

jamengual commented 2 years ago

I wonder if this is related to this : https://github.com/runatlantis/atlantis/pull/1633

jamengual commented 2 years ago

Hello @jamengual I was able to reproduce this issue on v0.19.8, using the testdrive repository.

It only happened when using pre workflow hooks, such as the following:

---
repos:
  - id: /.*/
    pre_workflow_hooks:
    - run: echo "hello world"

The server logs for the execution:

{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"server/server.go:869","msg":"Atlantis started - listening on port 4141","json":{}}
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"scheduled/executor_service.go:46","msg":"Scheduled Executor Service started","json":{}}
{"level":"info","ts":"2022-09-22T13:59:09.305-0300","caller":"events/events_controller.go:533","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=dfb30ec0-3a97-11ed-9f80-6ecf217e25c6"}}
{"level":"info","ts":"2022-09-22T13:59:14.712-0300","caller":"events/working_dir.go:225","msg":"creating dir \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
{"level":"info","ts":"2022-09-22T13:59:15.360-0300","caller":"runtime/pre_workflow_hook_runner.go:50","msg":"successfully ran \"echo \\\"hello world\\\"\" in \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}

pre_workflow_hooks run before any atlantis.yaml file is parsed.

after that if no atlantis.yaml is defined it it will do nothing.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'

scytem commented 1 year ago

@jamengual Hello! Recently I reproduced that problem on v0.25.0 Also, I'm using pre-workflow hooks as described above. Is it possible to reopen this issue to fix this bug?

jamengual commented 1 year ago

can you describe the steps you took to reproduce it?

scytem commented 1 year ago

Sure! atlantis-0:/$ atlantis version atlantis v0.25.0 (commit: a12823e) (build date: 2023-08-11T20:51:19.440Z)

Repos config:

repos:
  - id: "/.*/"
    branch: "/.*/"
    workflow: check
    allow_custom_workflows: true
    allowed_overrides: [workflow, delete_source_branch_on_merge]
    apply_requirements: [approved]
    pre_workflow_hooks:
      - run: python3 code/atlantis_config_merge.py # script for generating atlantis.yaml
  workflows:
    check:
      plan:
        steps:
        - run: echo "check passed"
    terragrunt-tst:
      plan:
        steps:
        - env:
          ...
        - run: |
            if [ ! -d "/tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM" ]; then
              mkdir -p /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
            fi
        - run: terragrunt run-all plan -out ./plan.tfplan --terragrunt-non-interactive &> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
        - run: terragrunt run-all show -json ./plan.tfplan --terragrunt-non-interactive 2> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/show_stderr.txt 1> ./plan.json || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/show_stderr.txt
        - run: /tmp/infracost breakdown --path=. --format=json --log-level=info --out-file=./infracost.json --project-name=$REPO_REL_DIR 2>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt 1>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
        - run: /tmp/infracost output --path=./infracost.json --format=json --out-file=./infracost-report.json 2>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt 1>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
        - run: |
            /tmp/infracost comment gitlab --repo $BASE_REPO_OWNER/$BASE_REPO_NAME \
              --merge-request $PULL_NUM \
              --path ./infracost-report.json \
              --gitlab-token $ATLANTIS_GITLAB_TOKEN \
              --behavior new \
              --show-all-projects
        # script for output formatting. Not sure if it's relevant for this issue. Just to share
        - run: python3 /opt/terragrunt_output_formatter.py --file /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt --output-file /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/fmt_output.txt && cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/fmt_output.txt
        - run: rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
      apply:
        steps:
        - env:
          ...
        - run: terragrunt run-all apply ./plan.tfplan --terragrunt-non-interactive

atlantis.yaml example:

projects:
- autoplan:
    when_modified:
    - '**/*.hcl'
    - '*.hcl'
  dir: accounts/...
  name: ...
  workflow: terragrunt-tst

As a result, I have an MR message:

Ran Apply for 0 projects:

atlantis apply -p ... solves the problem, but it's not comfortable to use it every time

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'

valentindeaconu commented 1 month ago

I also face this issue on v0.28.1. I have a multi-dir structure (mono-repository) with a lot of projects (> 300). I recently introduced a CI stage that has to run before Atlantis. Because there's no other way of pipelining this (running the CI stage before Atlantis), I disabled the autoplan feature from atlantis.yaml everywhere and added a step in our pipeline that comments atlantis plan if everything is fine.

The issue occurred after Atlantis successfully generated the plans, then the PR was rebased and until Atlantis finished planning, another user commented atlantis apply. With auto planning enabled, Atlantis would've commented back something like another command is already running for this PR and error out the atlantis apply command, but now, it just commented Ran Apply for 0 projects and then merged and deleted the source branch, because that's how we have it configured, so in our case, manually commenting again atlantis plan and atlantis apply doesn't work.

Is there any chance this issue will be reopened?

LE: I found a workaround for now, at least for our use-case: I added a pre-workflow-hook with commands: plan that checks if the CI pipeline is finished. If the pre-workflow-hook fails, no comment will be posted. When the CI finishes, it comments atlantis plan and the normal flow begins. With this setup, I was able to re-enable the auto-plan feature and avoid this bug.

runatlantis / atlantis

Atlantis apply all after a failed apply; outputs Ran Apply for 0 projects #773