Open pantelis-karamolegkos opened 2 years ago
I think this is dupe of #1914 which is fixed by https://github.com/runatlantis/atlantis/pull/2131 (not released at the time of this comment).
Any idea when a new release might land with this fix in it?
As, a side note, I saw this in my environment when I set:
parallel_plan: true
parallel_apply: true
in my atlantis.yaml
. Removing these let me continue to work for the time being.
I'm seeing this happening on 0.19.3
which I think includes https://github.com/runatlantis/atlantis/pull/2131 (It's not mentioned in the actual release notes, but is in the prerelease notes). I have parallel_plan
and parallel_apply
enabled, and would prefer to not disable them for the moment.
Edit: I just realised I'm getting a different error message, so probably not related
Plan Error
The default workspace at path . is currently locked by another command that is running for this pull request.
Wait until the previous command is complete and try again.
I think it might still be related to https://github.com/runatlantis/atlantis/pull/2131. I just haven't opened a new issue about it, yet. @pauloconnor do you want to open an issue for it or should I?
Go for it
is this still happening with v0.19.8
?
Any news?? I'm using version 0.22.2 and it has the same bug.
I haven't seen this in a while. @adrianocanofre are you using Atlantis as a GitHub App? How are the webhooks set up? Can you check your Atlantis logs to see if it gets a double webhook call?
Not sure if this is exactly the same issue, but if I do two comments at the same time with atlantis plan -d aws/thing
and atlantis plan -d aws/other
the first plan fails with the following error message:
The default workspace at path . is currently locked by another command that is running for this pull request.
Wait until the previous command is complete and try again.
They're separate projects with separate state so I was hoping to run both in parallel
and do you have parallel plan enabled? version of atlantis?
On Wed, Jan 11, 2023 at 8:10 AM Alex Nordlund @.***> wrote:
Not sure if this is exactly the same issue, but if I do two comments at the same time with atlantis plan -d aws/thing and atlantis plan -d aws/other the first plan fails with the following error message:
Wait until the previous command is complete and try again.```
They're separate projects with separate state so I was hoping to run both in parallel
— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2200#issuecomment-1379050386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERGGAH6MD6A2CBT3KATWR3LQHANCNFSM5TNOHTCQ . You are receiving this because you commented.Message ID: @.***>
it may be better to enable --enable-regexp-cmd
and then run something like
atlantis plan -p "aws/[other|thing]"
parallel_plan is off (or at least not set) and it's happening on v0.22.2
in 0.21.0 is the same?
On Fri, Jan 13, 2023 at 8:33 AM Alex Nordlund @.***> wrote:
parallel_plan is off (or at least not set) and it's happening on v0.22.2
— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2200#issuecomment-1382092849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERARJJFE2PETRZFP6EDWSF7UZANCNFSM5TNOHTCQ . You are receiving this because you commented.Message ID: @.***>
yes, before upgrading we saw it happen on v0.21.0
as well
So it sounds like it's happening with or without parallel_plan
in the following versions
It sounds like this is a regression.
Would anyone be able to keep going below 0.19.2 in order to find when this feature used to work? This would allow us to better pinpoint where this breaking change was introduced.
Any other details would help too. OP shows us the atlantis
server config but not the repo config...
Also there are no debug logs included.
We're always looking for maintainers too to resolve issues and add tests. Please consider contributing if you want this fixed. 🙏
I use the v0.22.2, and it's happening. Is there already news about this bug on how to fix it? Because sometimes it becomes a blocker
@nitrocode
tried v0.19
I am noticing dual events in the log
{"level":"info","ts":"2023-02-10T02:31:32.146Z","caller":"events/events_controller.go:417","msg":"parsed comment as command=\"plan\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{}}
{"level":"info","ts":"2023-02-10T02:31:32.309Z","caller":"events/events_controller.go:417","msg":"parsed comment as command=\"plan\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{}}
I am using github app
nevermind my problem was the same as https://github.com/runatlantis/atlantis/issues/1880 webhooks were setup in app and repo
@davidsielert glad you figured it out. Thank you for closing the loop on that.
@yohannafrans there is no update here unfortunately on this issue. The last request was to try to reproduce this issue with older versions to see if/when this regression was introduced.
https://github.com/runatlantis/atlantis/issues/2200#issuecomment-1384286670
If you're willing to propose a PR, we'd be happy to review it.
I rolled all the way back to 0.19.0 and the lock creation issue went away.
Unfortunately it seems to be missing a lot of features that I would want, such as repo_config_file (introduced in v0.22.0, it seems...), as I have infra code in a big monorepo and I don't prefer to have atlantis.yaml stuck at root... Any ideas what's happening?
that is interesting, not the big challenge is to find what between all those releases caused the issue.
@krrrr38 do you have any ideas about this? I think at some you mentioned this issue.
I'm not familiar with this issue. It may cause by duplicating gh-apps and manual configured webhook.
@victor-chan-groundswell What kind of integration do you use like gh-app, gh-user or gitlab-user and so on? Could you share your webhook event like this?
I am using gh-user....
Looking at the logs on 0.19.0, I do not see that specific logs on the event, but I do see logs of it executing what it's supposed to do.... logs have been modified to removed sensitive info...
{"level":"info","ts":"2023-04-18T05:49:12.523Z","caller":"server/server.go:1021","msg":"Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }","json":{}}
{"level":"info","ts":"2023-04-18T05:49:12.527Z","caller":"server/server.go:1021","msg":"Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }","json":{}}
{"level":"info","ts":"2023-04-18T05:49:12.537Z","caller":"server/server.go:1021","msg":"Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }","json":{}}
{"level":"info","ts":"2023-04-18T05:49:23.270Z","caller":"events/project_command_builder.go:330","msg":"successfully parsed atlantis.yaml file","json":{"repo":"REPO","pull":"380"}}
{"level":"info","ts":"2023-04-18T05:49:23.270Z","caller":"events/project_command_builder.go:338","msg":"1 projects are to be planned based on their when_modified config","json":{"repo":"REPO","pull":"380"}}
{"level":"info","ts":"2023-04-18T05:49:23.270Z","caller":"terraform/terraform_client.go:317","msg":"Cannot determine which version to use from terraform configuration, detected 0 possibilities.","json":{"repo":"REPO","pull":"380"}}
{"level":"info","ts":"2023-04-18T05:49:24.103Z","caller":"events/project_locker.go:86","msg":"acquired lock with id \"DIRECTORY"","json":{"repo":"REPO","pull":"380"}}
{"level":"info","ts":"2023-04-18T05:49:24.104Z","caller":"models/shell_command_runner.go:156","msg":"successfully ran \"echo \\\"terraform${ATLANTIS_TERRAFORM_VERSION}\\\"\" in \"DIRECTORY"","json":{"repo":"REPO","pull":"380"}}
Just for fun and games, I tried using 0.22.0 just to see what happens..... Atlantis erroed out by... well... not doing anything.... and eventually dropped this error message....
{"level":"error","ts":"2023-04-18T05:53:22.926Z","caller":"logging/simple_logger.go:163","msg":"invalid key: e2e64c02-ae7b-4f2b-8bdb-ff890f611bd5","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log\n\tgithub.com/runatlantis/atlantis/server/logging/simple_logger.go:163\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).respond\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2109\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/mux@v1.8.0/mux.go:210\ngithub.com/urfave/negroni/v3.Wrap.func1\n\tgithub.com/urfave/negroni/v3@v3.0.0/negroni.go:59\ngithub.com/urfave/negroni/v3.HandlerFunc.ServeHTTP\n\tgithub.com/urfave/negroni/v3@v3.0.0/negroni.go:33\ngithub.com/urfave/negroni/v3.middleware.ServeHTTP\n\tgithub.com/urfave/negroni/v3@v3.0.0/negroni.go:51\ngithub.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP\n\tgithub.com/runatlantis/atlantis/server/middleware.go:70\ngithub.com/urfave/negroni/v3.middleware.ServeHTTP\n\tgithub.com/urfave/negroni/v3@v3.0.0/negroni.go:51\ngithub.com/urfave/negroni/v3.(*Recovery).ServeHTTP\n\tgithub.com/urfave/negroni/v3@v3.0.0/recovery.go:210\ngithub.com/urfave/negroni/v3.middleware.ServeHTTP\n\tgithub.com/urfave/negroni/v3@v3.0.0/negroni.go:51\ngithub.com/urfave/negroni/v3.(*Negroni).ServeHTTP\n\tgithub.com/urfave/negroni/v3@v3.0.0/negroni.go:111\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2947\nnet/http.(*conn).serve\n\tnet/http/server.go:1991"}
I believe this is related to https://github.com/runatlantis/atlantis/pull/2253 which reverted https://github.com/runatlantis/atlantis/pull/2180. #2180 was a fix for https://github.com/runatlantis/atlantis/pull/2131 which is not reverted. So I'm going to make the initial assumption it is related.
I'll be digging into this in more depth as this is probably our biggest regression and I have a good test case (Autodesk runs Terragrunt in a mono-repo).
A good example of this is due to the changes in #2131 implementing path in the locker:
Which DefaultRepoRelDir
is set to .
:
https://github.com/runatlantis/atlantis/blob/96ab0ab2e7e2d39b9b59f3486904e20a97bed359/server/events/project_command_builder.go#L27
When you have multiple projects trigger in the same PR, they are hitting the same path in locker.
There's a couple other draft PRs that have gone stale around handling cloning and the working_dir_locker
I think overall we should probably have a solid approach to how we clone, and do file operations in those clone repos.
So, after I started working more on Atlantis, I realized that it might be an error on my end.... Here's what I have to do:
I reworked my yaml file to ensure that each project uses a different workspace and name.....
I am one of the folks who use this to generate the Atlantis file names... I just used the default option, which does not generate distinct workspace names for each project that generating the project id that it is not the default option.....
Not blaming them but rather... myself for not knowing Atlantis better... live and learn....
I do not believe is you fault @victor-chan-groundswell there was a PR that was merged long ago that actually broke the locks logic and introduced a regression and is affecting this use case.
We are working on analyzing the whole locking code to make sure the fix to this is what we expect.
well, I spoke too soon.... after messing around with it more... even with distinct project names and distinct workspace names for each project, I'm running into the same issue again....
The acm-dev workspace at path terraform/terragrunt-live/non-prod/us-east-1/dev/acm is currently locked by another command that is running for this pull request.\nWait until the previous command is complete and try again.
Hello, any news on this issue? I'm affected by this as well and the only fix is to downgrade to v0.19.0
which has older versions of terraform which I can't use... thanks
@carlitos081 I built my own docker container (I had to do it anyway since I use a custom workflow with terragrunt and I have to have that in the container as well) and overwrite the default containers version of terraform. That should at least work around that blocker.
@victor-chan-groundswell can you share you docker file? Did you install terraform and alter the seamlink on the container?
This is mine, I only install terragrunt:
ARG ATLANTIS_VERSION=v0.19.0
#Needed to downgrade becase of this issue https://github.com/runatlantis/atlantis/issues/2200
FROM docker-public-artifactory.somecompany.com/runatlantis/atlantis:$ATLANTIS_VERSION
ARG TERRAGRUNT_VERSION=v0.45.2
RUN --mount=type=secret,id=netrc curl --netrc-file /run/secrets/netrc \
https://artifactory.somecompany.com/artifactory/terraform-remote/gruntwork-io/terragrunt/releases/download/${TERRAGRUNT_VERSION}/terragrunt_linux_386 \
--output /usr/local/bin/terragrunt
RUN chmod +x /usr/local/bin/terragrunt
RUN terragrunt
RUN ls -la /usr/local/bin/
Thanks
@carlitos081
ARG ATLANTIS=0.19.0
FROM ghcr.io/runatlantis/atlantis:v${ATLANTIS}
RUN apk add \
aws-cli \
curl \
make \
unzip
ARG TERRAGRUNT=0.45.2
ARG TERRAFORM=1.2.9
###
### Ensure Terraform version is present, linked and validated
###
RUN set -eux \
&& if [ "${TERRAFORM}" = "latest" ]; then \
TERRAFORM="$( \
curl -sS https://releases.hashicorp.com/terraform/ \
| tac | tac \
| grep -Eo '/terraform/[0-9]\.[0-9]\.[0-9]/' \
| grep -Eo '[.0-9]+' \
| sort -V \
| tail -1 \
)"; \
fi \
&& if ! terraform version | grep -qE " v${TERRAFORM}\$"; then \
cd "/tmp" \
&& curl -sS "https://releases.hashicorp.com/terraform/${TERRAFORM}/terraform_${TERRAFORM}_linux_amd64.zip" -o terraform.zip \
&& unzip terraform.zip \
&& rm terraform.zip \
&& chmod +x terraform \
&& cp terraform /usr/local/bin/terraform${TERRAFORM} \
&& mv terraform /usr/local/bin/terraform; \
fi \
&& terraform${TERRAFORM} --version | grep "v${TERRAFORM}"
###
### Ensure Terragrunt version is present and validated
###
RUN set -eux \
&& if [ "${TERRAGRUNT}" = "latest" ]; then \
TERRAGRUNT="$( \
curl -L -sS --ipv4 https://github.com/gruntwork-io/terragrunt/releases \
| tac | tac \
| grep -Eo '"/gruntwork-io/terragrunt/releases/tag/v?[0-9]+\.[0-9]+\.[0-9]+"' \
| grep -Eo '[0-9]+\.[0-9]+\.[0-9]+' \
| sort -V \
| tail -1 \
)"; \
fi \
&& curl -L -sS --ipv4 "https://github.com/gruntwork-io/terragrunt/releases/download/v${TERRAGRUNT}/terragrunt_linux_amd64" -o /usr/local/bin/terragrunt \
&& chmod +x /usr/local/bin/terragrunt \
&& terragrunt --version | grep "v${TERRAGRUNT}"
### Ensure Terragrunt version is present and validated
###
I think I have a slight hint (not sure at all) that this happens when the pre_workflow_hook
runs and at the same time you comment atlantis plan
; in my case (where autoplan
is disabled) if i wait a bit for the pre_workflow_hook
to run before commenting atlantis plan
I hardly ever get into that case; I cannot back this up with data for the time being.
I think I have a slight hint (not sure at all) that this happens when the
pre_workflow_hook
runs and at the same time you commentatlantis plan
; in my case (whereautoplan
is disabled) if i wait a bit for thepre_workflow_hook
to run before commentingatlantis plan
I hardly ever get into that case; I cannot back this up with data for the time being.
I configured terragrunt-atlantis-config which runs a pre_workflow_hook
and this issue started happening right away. Going to try and figure out a way to wait for it.
I was able to confirm from the Atlantis log that this is caused by the pre-workflow hook. This is reproduceable by making commits and push them sequentially to the open PR.
Atlantis log (replaced sensitive information):
{
"level": "error",
"ts": "2023-05-10T09:42:08.454Z",
"caller": "events/command_runner.go:169",
"msg": "Error running pre-workflow hooks The default workspace at path . is currently locked by another command that is running for this pull request.\nWait until the previous command is complete and try again.. Proceeding with plan command.",
"json": {
"repo": "Org/repo",
"pull": "1"
},
"stacktrace": "github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:169"
}
{
"level": "warn",
"ts": "2023-05-10T09:42:08.727Z",
"caller": "events/project_command_builder.go:323",
"msg": "workspace was locked",
"json": {
"repo": "Org/repo",
"pull": "1"
},
"stacktrace": "github.com/runatlantis/atlantis/server/events.(*DefaultProjectCommandBuilder).buildAllCommandsByCfg\n\t/home/runner/work/atlantis/atlantis/server/events/project_command_builder.go:323\ngithub.com/runatlantis/atlantis/server/events.(*DefaultProjectCommandBuilder).BuildAutoplanCommands\n\t/home/runner/work/atlantis/atlantis/server/events/project_command_builder.go:215\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandBuilder).BuildAutoplanCommands.func1\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_builder.go:29\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandBuilder).buildAndEmitStats\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_builder.go:71\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandBuilder).BuildAutoplanCommands\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_builder.go:26\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:85\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:288\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:174"
}
{
"level": "error",
"ts": "2023-05-10T09:42:08.727Z",
"caller": "events/instrumented_project_command_builder.go:75",
"msg": "Error building auto plan commands: The default workspace at path . is currently locked by another command that is running for this pull request.\nWait until the previous command is complete and try again.",
"json": {},
"stacktrace": "github.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandBuilder).buildAndEmitStats\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_builder.go:75\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandBuilder).BuildAutoplanCommands\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_builder.go:26\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:85\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:288\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:174"
}
{
"level": "error",
"ts": "2023-05-10T09:42:09.338Z",
"caller": "events/pull_updater.go:17",
"msg": "The default workspace at path . is currently locked by another command that is running for this pull request.\nWait until the previous command is complete and try again.",
"json": {
"repo": "Org/repo",
"pull": "1"
},
"stacktrace": "github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\n\t/home/runner/work/atlantis/atlantis/server/events/pull_updater.go:17\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:90\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:288\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:174"
}
EDIT: Since we don't use auto plan, I was able to fix this behaviour by setting disable-autoplan
, this way Atlantis doesn't run pre-workflow hook unless it receives a command.
Thanks for confirming.
https://github.com/runatlantis/atlantis/issues/2200#issuecomment-1384286670
Please try versions prior to 0.19.2 to see where this regression began. It will make it easier to identify a fix for this.
Thanks for confirming.
Please try versions prior to 0.19.2 to see where this regression began. It will make it easier to identify a fix for this.
I'm afraid I can't test older versions on this environment, this is being used by multiple teams on a daily basis.
As stated in the new ADR to address locks, its due to conflicting working_dir
locks which is used by autoplan, pre-workflow hooks, and regular events (commits, comments) as they need to clone the git repo.
I'm working to engage the community on a best solution forward in regards to how and why we need to lock. You can read more in https://github.com/runatlantis/atlantis/pull/3345
Community Note
Overview of the Issue
I have set up atlantis and configured multiple
projects
.I am not using
workspaces
(therefore, for each project only thedefault
workspace should be applicable).However, when creating a GitHub Pull Request that includes changes to multiple projects, I get the following error(s)
This is despite the fact that docs state:
Reproduction Steps
workspaces
Logs
Environment details
If not already included, please provide the following:
Atlantis server-side config file:
Repo
atlantis.yaml
file:Any other information you can provide about the environment/deployment.
Additional Context