runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.68k stars 1.05k forks source link

Atlantis 0.17.2 not always creating 'default' working dir #1714

Open srlightbody opened 3 years ago

srlightbody commented 3 years ago

We've noticed some odd behavior after upgrading from 0.16.1 to 0.17.2. The behavior is: User creates a PR in GitHub Atlantis creates the repo folder and PR number folder in /home/atlantis/.atlantis/repos Atlantis does not create the default directory nor clone into it Atlantis attempts to check if the default workspace exists and fails with an error. Here's some debug level log output showing the issue - 021-07-21 13:24:48.491 MDT{caller: events/events_controller.go:417, json: {…}, level: info, msg: parsed comment as command="plan" verbose=false dir="" workspace="company-daily" project="" flags="", ts: 2021-07-21T19:24:48.490Z} 2021-07-21 13:24:48.491 MDT{caller: events/events_controller.go:439, json: {…}, level: debug, msg: executing command, ts: 2021-07-21T19:24:48.490Z} 2021-07-21 13:24:48.491 MDT{caller: server/middleware.go:37, json: {…}, level: debug, msg: POST /events – respond HTTP 200, ts: 2021-07-21T19:24:48.490Z} 2021-07-21 13:24:48.818 MDT{caller: server/server.go:749, json: {…}, level: info, msg: Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }, ts: 2021-07-21T19:24:48.818Z} 2021-07-21 13:24:48.885 MDT{caller: server/server.go:749, json: {…}, level: info, msg: Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }, ts: 2021-07-21T19:24:48.883Z} 2021-07-21 13:24:49.245 MDT{caller: events/project_command_builder.go:287, json: {…}, level: debug, msg: building plan command, ts: 2021-07-21T19:24:49.244Z} 2021-07-21 13:24:49.245 MDT{caller: events/project_command_builder.go:294, json: {…}, level: debug, msg: cloning repository, ts: 2021-07-21T19:24:49.244Z} 2021-07-21 13:24:49.245 MDT{caller: events/working_dir.go:202, json: {…}, level: info, msg: creating dir "/home/atlantis/.atlantis/repos/company/atlantis-foo/218/company-daily", ts: 2021-07-21T19:24:49.244Z} 2021-07-21 13:24:49.884 MDT{caller: events/working_dir.go:268, json: {…}, level: debug, msg: ran: git clone --branch 5625048_daily_staging --depth=1 --single-branch https://companyatlantis:<redacted>@github.com/company/atlantis-foo.git /home/atlantis/.atlantis/repos/company/atlantis-foo/218/company-daily. Output: Cloning into '/h… 2021-07-21 13:24:49.886 MDT{caller: server/server.go:749, json: {…}, level: info, msg: Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }, ts: 2021-07-21T19:24:49.886Z} 2021-07-21 13:24:50.226 MDT{caller: events/pull_updater.go:14, json: {…}, level: error, msg: checking if workspace exists: stat /home/atlantis/.atlantis/repos/company/atlantis-foo/218/default: no such file or directory, stacktrace: github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull /home/circleci/proje…

The full log for that last line is - { "caller": "events/pull_updater.go:14", "json": { ... }, "msg": "checking if workspace exists: stat /home/atlantis/.atlantis/repos/companymaps/atlantis-foo/218/default: no such file or directory", "stacktrace": "github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull /home/circleci/project/server/events/pull_updater.go:14 github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run /home/circleci/project/server/events/plan_command_runner.go:162 github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run /home/circleci/project/server/events/plan_command_runner.go:223 github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand /home/circleci/project/server/events/command_runner.go:212", "ts": "2021-07-21T19:24:50.225Z", "level": "error" }

The issue is intermittent, i.e. I can close out a PR that has had the issue, open a new one with the same commits in it, and the new one will work just fine. Rolling Atlantis back to 0.16.1 completely resolves the issue.

I've spent some time today digging around, I think it may be related to the change introduced in #1620 in some way, it seems like atlantis is attempting to use the default directory without it ever being initialized. We do use a custom workflow for our planning step that adds a simplified output comment for users, and an atlantis.yaml with file specific auto plan triggers, but if it's an interaction with those I have not figured out the issue yet.

msarvar commented 3 years ago

@srlightbody Are you triggering atlantis plan through GitHub comment? I'm thinking that this might be caused when autoplan is not triggered due to no changes in the code and no pre_workflow_hook is present. If either of autoplan or pre_workflow_hook is present they will create the default folder. If neither exists and you trigger the plan with PR comment(i.e. atlantis plan -w <workspace-name>) this error will happen. Is that's the case?

srlightbody commented 3 years ago

I've done some more digging and I think there were 2 distinct issues going on that made this extra confusing. A bunch of our webhooks were failing with a 301 after the upgrade, the url we were using as the hook target ended in a ., ie https://atlantis.endpoint./events. For some reason that started causing a 301. I've since rolled a change that fixes the hooks, and am going to retry the upgrade to 0.17.2 today so I can do more thorough testing.

That being said, when the issue was occurring it was with autoplans being prompted by an atlantis.yaml in the repo. We trigger autoplans based on changed file, and select a workspace as part of that. The default workspace is unused.

msarvar commented 3 years ago

@srlightbody This is definitely a bug and needs to be fixed. I think one potential workaround could be adding a no-op pre-workflow-hook. Can you try adding following to the config:

pre_workflow_hooks:
   - echo "do nothing"

Let me know if that mitigates the issue for the time being.

askmike1 commented 3 years ago

I get this same error under the same conditions. We are updating from 0.16.1 -> 0.17.2. Autoplans are disabled and we currently do not have a pre_workflow_hook. As a workaround, I was able to get past this by adding the following to my repos.yaml:

  pre_workflow_hooks:
    - run: echo "workaround"
emulanob commented 2 years ago

Hi there!

I'm facing a similar situation. Upgrading from version v.0.16.1 to anything above and including v0.17.2 makes all my plans fail with that same error:

"checking if workspace exists: stat /home/atlantis/.atlantis/repos/${repo-name}/terraform/${pull-request-id}/default: no such file or directory"

Important context:

Example command in a comment:

atlantis plan -d path/to/changes -w foo

Atlantis logs for above command:

{"level":"info","ts":"2022-08-17T12:21:15.342Z","caller":"events/events_controller.go:417","msg":"parsed comment as command=\"plan\" verbose=false dir=\"path/to/changes\" workspace=\"foo\" project=\"\" flags=\"\"","json":{}}
{"level":"info","ts":"2022-08-17T12:21:15.825Z","caller":"events/working_dir.go:202","msg":"creating dir \"/home/atlantis/.atlantis/repos/my-org/my-repo/3843/foo\"","json":{"repo":"my-org/my-repo","pull":"3843"}}
{"level":"error","ts":"2022-08-17T12:21:18.698Z","caller":"events/pull_updater.go:14","msg":"checking if workspace exists: stat /home/atlantis/.atlantis/repos/my-org/my-repo/3843/default: no such file or directory","json":{"repo":"my-org/my-repo","pull":"3843"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\n\t/home/runner/work/atlantis/atlantis/server/events/pull_updater.go:14\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:162\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:223\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:212"}

Additional comments

jamengual commented 2 years ago

is this still an issue with v0.19.8?

emulanob commented 2 years ago

Hi @jamengual. Yes, I started upgrading from v.0.16.1 to v0.19.8, which failed, and then downgraded until I reached one that worked.

j0rzsh commented 2 years ago

I'm working with latest version which is currently v0.19.9-pre.2022082 and the same error is happening. Git: bitbucket cloud Using workspaces

pre_workflow_hooks workaround commented before make it work.

sujeets-toast commented 1 year ago

I've been using Atlantis for a year and recently encountered an error with version 0.18.2.0.

Screenshot 2022-11-21 at 11 27 43 AM Screenshot 2022-11-21 at 11 35 15 AM Screenshot 2022-11-21 at 11 39 35 AM
nitrocode commented 1 year ago

It's possible the atlantis pod ran out of space?

Please also try with the latest version 0.20.1.

sujeets-toast commented 1 year ago

It's possible the atlantis pod ran out of space?

Please also try with the latest version 0.20.1.

Thanks for your reply. I created a new repository with the same name as the one it is currently using. It's working for me. Due to time constraints, I will upgrade the Atlantis image later because it will necessitate a significant amount of testing for us. 

tekumara commented 1 year ago

This happened to me when trying to run atlantis plan via comment on an empty PR. I pushed a commit with a trivial change and the atlantis plan via comment worked.

hskrtich commented 1 year ago

My org has run into this same issue a number of times. It seems to randomly resolve it self at some point. We also use custom workspaces. This is still happening with the latest version of Atlantis (v0.23.3).

bml1g12 commented 1 year ago

Same issue here on latest version, occurs on all new PRs until one runs atlantis plan. e.g. cannot run atlantis plan -p project_name without running atlantis plan first

inkel commented 1 year ago

This is also happening for us, we were using 0.19.9 and recently upgraded to 0.24.2.

Jonathanboliveira commented 1 year ago

Any updates for this issue? We are having the same problem here in the organization, when updating from v0.17.0 to v0.24.3

kelvingl commented 1 year ago

Hello! Any updates for this issue? We are having the same problem here in the organization, on v0.25.0

jamengual commented 1 year ago

we are documenting the Locks flow, which includes part of the cloning process too, after that we will try to figure a way to make this more stable https://github.com/runatlantis/atlantis/pull/3345

carmennavarreteh commented 8 months ago

Hi! In my organisation we are also facing this, and we are using the 0.25.0 version. We have some reproducible cases: