runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.7k stars 1.05k forks source link

v0.22.2+: `atlantis/post_workflow_hook` errors after `apply` when using `--automerge` flag due to deleted directory #3031

Open Yasmine92 opened 1 year ago

Yasmine92 commented 1 year ago

Community Note


Overview of the Issue

When using the flag --automerge, the Postworkflow hooks are executed after the PR is merged (the PR directory is deleted, and the branch is deleted). which results in errors in logs about not being able to read the current directory, or fetching from origin.

Reproduction Steps

Logs ``` {"level":"info","ts":"2023-01-24T09:58:33.477Z","caller":"events/events_controller.go:542","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=a9221ee0-9bcd-11ed-9d8e-31769df0a119"}} {"level":"info","ts":"2023-01-24T09:58:37.115Z","caller":"terraform/terraform_client.go:317","msg":"Cannot determine which version to use from terraform configuration, detected 2 possibilities.","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:37.116Z","caller":"terraform/terraform_client.go:317","msg":"Cannot determine which version to use from terraform configuration, detected 2 possibilities.","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:37.460Z","caller":"runtime/apply_step_runner.go:39","msg":"starting apply","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:41.988Z","caller":"models/shell_command_runner.go:156","msg":"successfully ran \"/usr/local/bin/terraform apply -input=false \\\"/atlantis-data/repos/example/sandbox-project/19/dev/terraform/dev-dev.tfplan\\\"\" in \"/atlantis-data/repos/example/sandbox-project/19/dev/terraform\"","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:41.989Z","caller":"runtime/apply_step_runner.go:58","msg":"apply successful, deleting planfile","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:42.363Z","caller":"events/instrumented_project_command_runner.go:82","msg":"apply success. output available at: https://github.com/example/sandbox-project/pull/19","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:45.244Z","caller":"events/automerger.go:32","msg":"automerging pull request","json":{"repo":"example/sandbox-project","pull":"19"}} {"level":"info","ts":"2023-01-24T09:58:47.986Z","caller":"events/instrumented_pull_closed_executor.go:45","msg":"Initiating cleanup of pull data.","json":{"repository":"example/sandbox-project","pull-num":"19"}} {"level":"warn","ts":"2023-01-24T09:58:48.154Z","caller":"events/working_dir.go:168","msg":"getting remote update failed: Fetching origin\nerror: cannot open '.git/FETCH_HEAD': No such file or directory\nerror: could not fetch origin\nFetching head\nfatal: Unable to read current working directory: No such file or directory\nerror: could not fetch head\nfatal: Unable to read current working directory: No such file or directory\n","json":{"repo":"example/sandbox-project","pull":"19"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*FileWorkspace).warnDiverged\n\tgithub.com/runatlantis/atlantis/server/events/working_dir.go:168\ngithub.com/runatlantis/atlantis/server/events.(*FileWorkspace).Clone\n\tgithub.com/runatlantis/atlantis/server/events/working_dir.go:117\ngithub.com/runatlantis/atlantis/server/events.(*DefaultPostWorkflowHooksCommandRunner).RunPostHooks\n\tgithub.com/runatlantis/atlantis/server/events/post_workflow_hooks_command_runner.go:69\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:298"} {"level":"info","ts":"2023-01-24T09:58:49.011Z","caller":"events/events_controller.go:470","msg":"deleted locks and workspace for repo example/sandbox-project, pull 19","json":{"gh-request-id":"X-Github-Delivery=b183cfc0-9bcd-11ed-9de7-c11f64f1d767"}} {"level":"error","ts":"2023-01-24T09:58:49.101Z","caller":"events/command_runner.go:301","msg":"Error running post-workflow hooks chdir /atlantis-data/repos/example/sandbox-project/19/default: no such file or directory: running \"ls -l /etc/atlantis/repos.yaml\" in \"/atlantis-data/repos/example/sandbox-project/19/default\": \n.","json":{"repo":"example/sandbox-project","pull":"19"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:301"} ```

Environment details

post-worklflow hook configuration from the server config:

post_workflow_hooks:
   - run: ls -l /etc/atlantis/repos.yaml

Repo atlantis.yaml file:

---
# atlantis.yaml
version: 3
parallel_plan: true
projects:
- name: dev
  dir: terraform
  autoplan:
    when_modified: ["*.tf*"]
  workflow: dev
  workspace: dev
workflows:
  dev:
    plan:
      steps:
      - init:
      - plan:
          extra_args: ["-var-file", "environments/dev.tfvars"]

Additional Context

nitrocode commented 1 year ago

Could you include your yaml configuration such as your post workflow hook?

Looks like this is the main error

Error running post-workflow hooks chdir /atlantis-data/repos///7/default: no such file or directory: running "rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM"
jamengual commented 1 year ago

it looks like you have an additional / in the path on that command

On Mon, Jan 23, 2023, 6:27 a.m. nitrocode @.***> wrote:

Could you include your yaml configuration such as your post workflow hook?

Looks like this is the main error

Error running post-workflow hooks chdir /atlantis-data/repos///7/default: no such file or directory: running "rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM"

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/3031#issuecomment-1400437566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERG3EO4DB4IPMWUUHQTWT2IMBANCNFSM6AAAAAAUD2FUUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Yasmine92 commented 1 year ago

/atlantis-data/repos///7/default:

no, I just tried to hide the repo and organisation name and it resulted in that :facepalm:. I'll update the issue with a different log, using a simple post workflow hook

Yasmine92 commented 1 year ago

@jamengual and @nitrocode I updated the issue with another simpler example (different and simpler post workflow), and the log from the apply time.

nitrocode commented 1 year ago

@Yasmine92 can you ssh to the pod and try to access the directory it's complaining about? Can you cd to it?

cd /atlantis-data/repos/example/sandbox-project/19/default
Yasmine92 commented 1 year ago

@Yasmine92 can you ssh to the pod and try to access the directory it's complaining about? Can you cd to it?

cd /atlantis-data/repos/example/sandbox-project/19/default

Well from my understanding, the directory for the PR is cleaned up after the automerge at this step:

{"level":"info","ts":"2023-01-24T09:58:45.244Z","caller":"events/automerger.go:32","msg":"automerging pull request","json":{"repo":"example/sandbox-project","pull":"19"}}
{"level":"info","ts":"2023-01-24T09:58:47.986Z","caller":"events/instrumented_pull_closed_executor.go:45","msg":"Initiating cleanup of pull data.","json":{"repository":"example/sandbox-project","pull-num":"19"}}

so it's normal that the directory is already gone after merge:

$ ls /atlantis-data/repos/example/sandbox-project/19           
ls: /atlantis-data/repos/example/sandbox-project/19: No such file or directory
Yasmine92 commented 1 year ago

but the thing is that events/working_dir.go is called before running the post-workflow-hook, so it always try to cd to the directory for the PR (that is already merged) before executing the post-workflow action that is not necessarily related to that directory, like the example I showed ls -l /etc/atlantis/repos.yaml So I think it would be better to change the logic of the code to execute the post-workflow-hooks before automerging.

nitrocode commented 1 year ago

@Fabianoshz for your thoughts here.

@Yasmine92 please feel free to test a change locally and propose it. What's odd is that I'm using the latest version, most of the above flags, and not experiencing this. I wonder what condition would make the cleanup happen prior to the post workflow run?

Yasmine92 commented 1 year ago

@Fabianoshz for your thoughts here.

@Yasmine92 please feel free to test a change locally and propose it. What's odd is that I'm using the latest version, most of the above flags, and not experiencing this. I wonder what condition would make the cleanup happen prior to the post workflow run?

@nitrocode are you using both --automerge and a post-workflow-hook? how does it look in the logs when you do an apply? Sure, I'll give it a try :)

Fabianoshz commented 1 year ago

@Yasmine92 can you check if you can access the PR directory?

Taking from memory I believe the project directory is deleted right after the apply, while the PR directory lives until we receive a merged event. Again, taking from memory I might be wrong.

Yasmine92 commented 1 year ago

@Yasmine92 can you check if you can access the PR directory?

  • This should work: ls /atlantis-data/repos/example/sandbox-project/19/ - This is the PR directory.
  • This should not: ls atlantis-data/repos/example/sandbox-project/19/default

Taking from memory I believe the project directory is deleted right after the apply, while the PR directory lives until we receive a merged event. Again, taking from memory I might be wrong.

Thanks @Fabianoshz , the PR directory /atlantis-data/repos/example/sandbox-project/19/ is deleted after apply, because the merge event comes automatically after apply, because of the flag "--automerge". and that's the exact bug I'm pointing out, using --automerge combined with a post-workflow makes the dir of the Pr deleted before the post-workflow is executed.

bob-rohan commented 1 year ago

Seeing this race condition also. Thanks the notes, so dir is deleted on receipt of merge event, if this is receveid before post-workflow is run, then the directory from which the post workflow command would be run no longer exists. Is this correct summary?

jamengual commented 1 year ago

correct

On Thu, Jan 26, 2023, 2:18 a.m. Bob Rohan @.***> wrote:

Seeing this race condition also. Thanks the notes, so dir is deleted on receipt of merge event, if this is receveid before post-workflow is run, then the directory from which the post workflow command would be run no longer exists. Is this correct summary?

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/3031#issuecomment-1404797720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERDUZ4UQ6G524IGFG2TWUJFNTANCNFSM6AAAAAAUD2FUUA . You are receiving this because you were mentioned.Message ID: @.***>

nitrocode commented 1 year ago

I'm seeing this too on 0.22.3.

I haven't checked previous versions yet. Has anyone gone back to see if this is a recent version that introduced this? Has anyone tested with 0.22.2, 0.22.1, 0.22.0? Id go as far back as 0.21.0 to see if it's a regression and go up from there.

this issue may be related to

cilindrox commented 1 year ago

@nitrocode I made the jump from v0.19.8 (working) to v0.22.2 (merge error) and currently v0.22.3, which also has the merge error.

nitrocode commented 1 year ago

Ah thank you so this is definitely a regression. If anyone gets a chance, please test in earlier versions to see if we can pinpoint the pr that introduced this breaking change.

Cc @Fabianoshz in case you or others can spot the issue without checking individual versions

https://github.com/runatlantis/atlantis/compare/v0.19.8...v0.22.2

nitrocode commented 1 year ago

@Fabianoshz is it possible that the post workflow run never used to run from the pr directory until a recent pr? If so, then we'd just have to run the delete after the post workflow run completes.

This deletes the dir

https://github.com/runatlantis/atlantis/blob/a1f389add5e943f700edee06645e48725ce30ff5/server/events/pull_closed_executor.go#L100

This is the function call that deletes the dir

https://github.com/runatlantis/atlantis/blob/a1f389add5e943f700edee06645e48725ce30ff5/server/events/pull_closed_executor.go#L81

https://github.com/runatlantis/atlantis/blob/a1f389add5e943f700edee06645e48725ce30ff5/server/events/instrumented_pull_closed_executor.go#L34

https://github.com/runatlantis/atlantis/blob/a1f389add5e943f700edee06645e48725ce30ff5/server/controllers/events/events_controller.go#L460

https://github.com/runatlantis/atlantis/blob/a1f389add5e943f700edee06645e48725ce30ff5/server/controllers/events/events_controller.go#L420

https://github.com/runatlantis/atlantis/blob/a1f389add5e943f700edee06645e48725ce30ff5/server/events/command_runner.go#L172-L176

weeezes commented 1 year ago

I'm not sure if this is what's causing issues for others here, but for some reason a perfectly well functioning pre-workflow-hook started to fail for me yesterday. Turns out the issue was that my :magic_wand: magical diff script :magic_wand: left things into an unwanted state and got cleaned up by the logic maybe here? https://github.dev/runatlantis/atlantis/blob/ba7b67a42cf8105fbbbe4a1d003e06cca58fc2a0/server/events/working_dir.go#L97-L98 After I set the pre-workflow-hook to do git checkout $HEAD_COMMIT in the end things started to work again.

arohter commented 5 days ago

Still an issue in v0.29.0 afaik. Moving from post_workflow_hooks to a workflow steps run: command stanza is our workaround.