Open mlehner616 opened 5 years ago
Yeah it's a bug. If autoplan had been enabled then there would have been new plans generated and the apply wouldn't have worked.
@lkysow Thanks for the confirmation. This bug is killing us right now. We want people to be able see non-locking plans being run (in our normal CI pipleine) before Approvals are submitted so they can actually validate their code before blocking other development. If we wanted to dig into solving this, where would be a good place to start looking? I took a really quick glance through the repo and nothing jumped out at me.
Thank you for building this tool by the way, I really appreciate the work that went into this.
After re-reading the ticket, this isn't technically a bug (although for your use-case it may as well be). Atlantis is just doing what you told it, it's up to you to run atlantis plan
if you've pushed a new commit and don't have autoplan running. Coupled with automerge is the real issue here. If you didn't have automerge you'd quickly realize that you didn't re-run plan and there wouldn't be an issue.
Also if you were running with the -d
or -p
flags you'd get an error that "the plan doesn't exist for that project, please run plan". When we added the apply-all command (i.e. atlantis apply
) we didn't replicate the behaviour. I'm not sure if it ever makes sense to not give an error in this case but I'd at least like to add a flag that lets you keep the old behaviour in case people were relying on it.
If we were to add some functionality to detect this case, it would be here: https://github.com/runatlantis/atlantis/blob/master/server/events/project_command_builder.go#L204 after Atlantis has found no pending plans. It could then exit with an error in this case.
I think a path forward may be:
--allow-no-plan-apply
which defaults to false
now (breaking change)Well, I actually think your original interpretation made sense to me. To clarify, we would never want atlantis to apply without having the most up to date plan saved and locked.
What we’re doing instead is just running validate, fmt, lint andterraform plan --lock=false
in vanilla gitlab CI. Devs open an MR and need to fix any issues there, and get all approvals first, before the atlantis plan. The problem we were solving by doing it this way was autoplan opening locks too early in the process and thus blocking other MRs that were ready to be applied.
I still think this is a bug. Yes I wanted autoplan disabled but that just means I want the developers to run it if and only if all the pre-apply requirements are met. I would expect the apply step to run the same validation that the plan is locked and up to date and apply based on that. Turning off autoplan shouldn’t affect those checks. What seems to be happening with autoplan disabled is the apply is ignoring the plans and ultimately just applies nothing.
I can confirm there are plans and locks are created when they are supposed to be. It appears that the atlantis apply step is just ignoring those if a second apply is run after this first one fails. Expected behavior would be for the apply step to either force a replan if the MR was updated, or attempt to re-apply the original plan. It’s doing neither of these right now.
One thing I did notice was that if the apply does fail, the saved plans are deleted but the locks are left open (this may be the actual bug here). If we removed those after a failed apply, that would basically force the plan step. I don’t know if that’s the best solution but I think it would work.
it's up to you to run
atlantis plan
if you've pushed a new commit and don't have autoplan running.
Our team has autoplan on, but pushing a new commit doesn't cause Atlantis to redo the plan (because Bitbucket).
One thing I did notice was that if the apply does fail, the saved plans are deleted but the locks are left open (this may be the actual bug here). If we removed those after a failed apply, that would basically force the plan step. I don’t know if that’s the best solution but I think it would work.
I agree we should either have Atlantis not delete the plans, or error if an apply is attempted without any plans.
@lkysow - what's the reason for Atlantis to delete the plans after a failed apply? It could have failed because a transient provider issue, and re-running apply on the same plan would later succeed.
HI everyone I met this issue too. Any work in progress to fix this bug ? I removed the locks as mentioned above and redo "atlantis plan". It still shows "Ran Plan for 0 projects:"
Running plan on the same PR after a failed to apply should not be any different than if atlantis does not delete the plan, it is just an extra step.
But if someone else in another PR modify the environment you are running plan against you will have a problem no matter what but by re-running a plan you could actually find the drift.
I do not think this is a bug, it is a bit annoying to run plan again but since terraform is idempotent it should only apply the difference.
I can run atlantis plan
again and I am still getting the output "Ran Plan for 0 projects:"
If I run with atlantis plan -p *-production
it will apply.
with autoplan, you need to define every directory you want autoplan on/off in your atlantis.yaml otherwise it does not work, is what you guys are doing?
if this was a bug, no one will be using atlantis so I want to make sure if this is specific to multi-dir structure etc. For that, we need to see the altlantis.yaml files and dir structure so we can have a better idea.
This could be as simple as better documentation of autoplan with some examples.
@jamengual I am using an atlantis.yaml
that was previously working. I think around v0.19.* this started breaking. It is about 50 projects, each with its own project name so that the -p
wildcard flag can be used. The pattern for the naming is ${service_name}-${environment}
.
I discovered that if i run atlantis apply -p *-environment
, then the command will run, but it will run for ALL projects, regardless of what files have changed.
I have autoplan on, but if I run atlantis plan
manually, it doesn't seem to make a difference.
Also of note, I am using custom workflows, not sure if that makes a difference.
@evanstachowiak Please test with the pre-release image, we did some bug fixes there and I wonder if that could be the issue:
docker pull ghcr.io/runatlantis/atlantis:v0.19.3-pre.20220408
is this still an issue with v0.19.8
?
Hello @jamengual I was able to reproduce this issue on v0.19.8
, using the testdrive repository.
It only happened when using pre workflow hooks, such as the following:
---
repos:
- id: /.*/
pre_workflow_hooks:
- run: echo "hello world"
The server logs for the execution:
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"server/server.go:869","msg":"Atlantis started - listening on port 4141","json":{}}
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"scheduled/executor_service.go:46","msg":"Scheduled Executor Service started","json":{}}
{"level":"info","ts":"2022-09-22T13:59:09.305-0300","caller":"events/events_controller.go:533","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=dfb30ec0-3a97-11ed-9f80-6ecf217e25c6"}}
{"level":"info","ts":"2022-09-22T13:59:14.712-0300","caller":"events/working_dir.go:225","msg":"creating dir \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
{"level":"info","ts":"2022-09-22T13:59:15.360-0300","caller":"runtime/pre_workflow_hook_runner.go:50","msg":"successfully ran \"echo \\\"hello world\\\"\" in \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
yes it's still an issue @jamengual
I wonder if this is related to this : https://github.com/runatlantis/atlantis/pull/1633
Hello @jamengual I was able to reproduce this issue on
v0.19.8
, using the testdrive repository.It only happened when using pre workflow hooks, such as the following:
--- repos: - id: /.*/ pre_workflow_hooks: - run: echo "hello world"
The server logs for the execution:
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"server/server.go:869","msg":"Atlantis started - listening on port 4141","json":{}} {"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"scheduled/executor_service.go:46","msg":"Scheduled Executor Service started","json":{}} {"level":"info","ts":"2022-09-22T13:59:09.305-0300","caller":"events/events_controller.go:533","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=dfb30ec0-3a97-11ed-9f80-6ecf217e25c6"}} {"level":"info","ts":"2022-09-22T13:59:14.712-0300","caller":"events/working_dir.go:225","msg":"creating dir \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}} {"level":"info","ts":"2022-09-22T13:59:15.360-0300","caller":"runtime/pre_workflow_hook_runner.go:50","msg":"successfully ran \"echo \\\"hello world\\\"\" in \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
pre_workflow_hooks run before any atlantis.yaml file is parsed.
after that if no atlantis.yaml is defined it it will do nothing.
This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'
@jamengual Hello! Recently I reproduced that problem on v0.25.0
Also, I'm using pre-workflow hooks as described above. Is it possible to reopen this issue to fix this bug?
can you describe the steps you took to reproduce it?
Sure!
atlantis-0:/$ atlantis version atlantis v0.25.0 (commit: a12823e) (build date: 2023-08-11T20:51:19.440Z)
Repos config:
repos:
- id: "/.*/"
branch: "/.*/"
workflow: check
allow_custom_workflows: true
allowed_overrides: [workflow, delete_source_branch_on_merge]
apply_requirements: [approved]
pre_workflow_hooks:
- run: python3 code/atlantis_config_merge.py # script for generating atlantis.yaml
workflows:
check:
plan:
steps:
- run: echo "check passed"
terragrunt-tst:
plan:
steps:
- env:
...
- run: |
if [ ! -d "/tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM" ]; then
mkdir -p /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
fi
- run: terragrunt run-all plan -out ./plan.tfplan --terragrunt-non-interactive &> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
- run: terragrunt run-all show -json ./plan.tfplan --terragrunt-non-interactive 2> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/show_stderr.txt 1> ./plan.json || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/show_stderr.txt
- run: /tmp/infracost breakdown --path=. --format=json --log-level=info --out-file=./infracost.json --project-name=$REPO_REL_DIR 2>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt 1>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
- run: /tmp/infracost output --path=./infracost.json --format=json --out-file=./infracost-report.json 2>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt 1>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
- run: |
/tmp/infracost comment gitlab --repo $BASE_REPO_OWNER/$BASE_REPO_NAME \
--merge-request $PULL_NUM \
--path ./infracost-report.json \
--gitlab-token $ATLANTIS_GITLAB_TOKEN \
--behavior new \
--show-all-projects
# script for output formatting. Not sure if it's relevant for this issue. Just to share
- run: python3 /opt/terragrunt_output_formatter.py --file /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt --output-file /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/fmt_output.txt && cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/fmt_output.txt
- run: rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
apply:
steps:
- env:
...
- run: terragrunt run-all apply ./plan.tfplan --terragrunt-non-interactive
atlantis.yaml example:
projects:
- autoplan:
when_modified:
- '**/*.hcl'
- '*.hcl'
dir: accounts/...
name: ...
workflow: terragrunt-tst
As a result, I have an MR message:
Ran Apply for 0 projects:
atlantis apply -p ...
solves the problem, but it's not comfortable to use it every time
This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'
I also face this issue on v0.28.1. I have a multi-dir structure (mono-repository) with a lot of projects (> 300). I recently introduced a CI stage that has to run before Atlantis. Because there's no other way of pipelining this (running the CI stage before Atlantis), I disabled the autoplan feature from atlantis.yaml everywhere and added a step in our pipeline that comments atlantis plan
if everything is fine.
The issue occurred after Atlantis successfully generated the plans, then the PR was rebased and until Atlantis finished planning, another user commented atlantis apply
. With auto planning enabled, Atlantis would've commented back something like another command is already running for this PR and error out the atlantis apply
command, but now, it just commented Ran Apply for 0 projects
and then merged and deleted the source branch, because that's how we have it configured, so in our case, manually commenting again atlantis plan
and atlantis apply
doesn't work.
Is there any chance this issue will be reopened?
LE: I found a workaround for now, at least for our use-case: I added a pre-workflow-hook with commands: plan
that checks if the CI pipeline is finished. If the pre-workflow-hook fails, no comment will be posted. When the CI finishes, it comments atlantis plan
and the normal flow begins. With this setup, I was able to re-enable the auto-plan feature and avoid this bug.
I have a repo that uses the default workspace but there are a number of different project folders.
Atlantis version: 0.8.3 Terraform version: v0.12.8
Plans are generated for all three projects as normal after commenting exactly
atlantis plan
. Immediately afterword, commentingatlantis apply
attempts to apply all three environments as expected. In this case, there was an apply error due to an AWS IAM policy being misconfigured and the plans were not successfully applied. A commit was pushed to fix this issue and anotheratlantis apply
was submitted. Note, there was not anotheratlantis plan
after the fix commit was pushed. Atlantis behaved as if it had forgotten about the failed plans and assumed they had been applied successfully when, in fact, they had not been. I believe the expected behavior should be to reject the apply since new commits were made and force another plan be run, correct?The result was the following: