Open cblkwell opened 3 weeks ago
Hello, @cblkwell! I'm here to assist you with any bugs, questions, or contributions you may have. Let's work together to resolve your issue.
There are a couple of similar issues that might be relevant to your situation:
Apply fails on Gitlab 16.4.0 with "Pull request must be mergeable before running apply." - This open issue discusses problems with mergeability checks in GitLab, which might have some parallels with your GitHub issue. A suggested workaround involves manually updating the commit status using the GitLab API, which might inspire a similar approach for GitHub [1].
GitHub Mergability check should support ruleset bypasser status - This open issue addresses the need for Atlantis to support certain GitHub ruleset bypass statuses, which could be related to your problem with the ATLANTIS_GH_ALLOW_MERGEABLE_BYPASS_APPLY
flag not functioning as expected [2].
Regarding the ATLANTIS_GH_ALLOW_MERGEABLE_BYPASS_APPLY
flag, it is designed to bypass the mergeability check related to the atlantis/apply
status. However, it seems that in version 0.30.0, this behavior might not be working as intended if a previous apply operation failed. This could be due to changes in how Atlantis handles mergeability checks or failed apply operations in this version.
Rolling back to version 0.29.0 appears to have resolved the issue for you, indicating a potential regression or change in behavior in version 0.30.0. It might be helpful to monitor the Atlantis repository for any updates or fixes related to this issue.
To continue talking to Dosu, mention @dosu.
If you are running Atlantis using github app, you may need to give it new permissions - Actions/Read. See https://www.runatlantis.io/docs/access-credentials.html#permissions
I'll check (I don't have the permissions necessary to do so in our org). Was this something added in 0.30.0? If that's the case, it's probably a good thing to add breaking changes like this to the release notes. I don't see anything in https://github.com/runatlantis/atlantis/releases/tag/v0.30.0 describing this change.
Im having similar issue but mine throws an error heres my Issue https://github.com/runatlantis/atlantis/issues/5038
We're experiencing the same issue, despite our GitHub App having all the required permissions. When one of the required checks fails and is then successfully re-run for the same commit, Atlantis still doesn't allow us to apply the PR. Instead, it displays the error mentioned in this issue.
Here's what I found: We have a PR with one required check that passed after a few reruns. It's now in passed state:
When we query check runs for the commit as Atlantis does, we see the following response from the GitHub GraphQL API:
"nodes": [
{
"__typename": "CheckRun",
"name": "🔄 Validate pull request",
"status": "COMPLETED",
"conclusion": "FAILURE"
},
{
"__typename": "CheckRun",
"name": "🔄 Validate pull request",
"status": "COMPLETED",
"conclusion": "CANCELLED"
},
{
"__typename": "CheckRun",
"name": "🔄 Validate pull request",
"status": "COMPLETED",
"conclusion": "CANCELLED"
},
{
"__typename": "CheckRun",
"name": "🔄 Validate pull request",
"status": "COMPLETED",
"conclusion": "CANCELLED"
},
{
"__typename": "CheckRun",
"name": "🔄 Validate pull request",
"status": "COMPLETED",
"conclusion": "SUCCESS"
},
{
"__typename": "CheckRun",
"name": "🔄 Validate pull request",
"status": "COMPLETED",
"conclusion": "SUCCESS"
}
]
GitHub returns all check runs, including failed ones, in chronological order. After investigating the Atlantis code to understand how it evaluates a commit's PR mergeability status, I found that the issue appears to be in this function: https://github.com/runatlantis/atlantis/blob/main/server/events/vcs/github_client.go#L718-L732
Atlantis iterates through the checkRuns
here and determines the required check status based on the first run in the array ignoring the rest of the runs. In our case, it's FAILURE
.
The fix seems straightforward. We simply need to evaluate the status of the last checkRun
instead of the first one.
If the maintainers approve of this proposed solution, I'd be happy to submit a pull request with these changes.
We're experiencing the same issue, despite our GitHub App having all the required permissions. When one of the required checks fails and is then successfully re-run for the same commit, Atlantis still doesn't allow us to apply the PR. Instead, it displays the error mentioned in this issue.
Here's what I found: We have a PR with one required check that passed after a few reruns. It's now in passed state:
When we query check runs for the commit as Atlantis does, we see the following response from the GitHub GraphQL API:
"nodes": [ { "__typename": "CheckRun", "name": "🔄 Validate pull request", "status": "COMPLETED", "conclusion": "FAILURE" }, { "__typename": "CheckRun", "name": "🔄 Validate pull request", "status": "COMPLETED", "conclusion": "CANCELLED" }, { "__typename": "CheckRun", "name": "🔄 Validate pull request", "status": "COMPLETED", "conclusion": "CANCELLED" }, { "__typename": "CheckRun", "name": "🔄 Validate pull request", "status": "COMPLETED", "conclusion": "CANCELLED" }, { "__typename": "CheckRun", "name": "🔄 Validate pull request", "status": "COMPLETED", "conclusion": "SUCCESS" }, { "__typename": "CheckRun", "name": "🔄 Validate pull request", "status": "COMPLETED", "conclusion": "SUCCESS" } ]
GitHub returns all check runs, including failed ones, in chronological order. After investigating the Atlantis code to understand how it evaluates a commit's PR mergeability status, I found that the issue appears to be in this function: https://github.com/runatlantis/atlantis/blob/main/server/events/vcs/github_client.go#L718-L732
Atlantis iterates through the
checkRuns
here and determines the required check status based on the first run in the array ignoring the rest of the runs. In our case, it'sFAILURE
.The fix seems straightforward. We simply need to evaluate the status of the last
checkRun
instead of the first one. If the maintainers approve of this proposed solution, I'd be happy to submit a pull request with these changes.
@jamengual @nitrocode What do you think?
👋 @jamengual asked me to help coordinate this issue.
@ajax-ryzhyi-r I agree that's very likely where this logic is happening, and it also seems like that method was introduced in 0.30 so it fits with @cblkwell 's report that downgrading to 0.29 fixes the problem.
I haven't dug into the code yet to tell whether the specific solution you mention will fix the problem, but feel free to submit a PR and I can test and review!
@ajax-ryzhyi-r Are you using github server or cloud? Asking because we weren't able to reproduce the results your graphql query provided and we definitely don't have problems with failed atlantis/apply
not being able to be re-applied like in original post. I'm saying we haven't seen this problem because we are the ones who made these changes and have been running them in production on github cloud for better part of the year..
@ajax-ryzhyi-r Are you using github server or cloud? Asking because we weren't able to reproduce the results your graphql query provided and we definitely don't have problems with failed
atlantis/apply
not being able to be re-applied like in original post. I'm saying we haven't seen this problem because we are the ones who made these changes and have been running them in production on github cloud for better part of the year..
We're using GitHub Cloud and haven't encountered issues (or haven't noticed) with the atlantis/apply
status check. However, we're facing problems with other required checks, specifically GitHub Actions workflow checks. If the required GitHub Actions workflow check fails and we rerun it successfully, Atlantis still treats the PR as unmergeable. I can provide the GraphQL query I used for my investigation if you'd like
👋 @jamengual asked me to help coordinate this issue.
@ajax-ryzhyi-r I agree that's very likely where this logic is happening, and it also seems like that method was introduced in 0.30 so it fits with @cblkwell 's report that downgrading to 0.29 fixes the problem.
I haven't dug into the code yet to tell whether the specific solution you mention will fix the problem, but feel free to submit a PR and I can test and review!
Thanks in advance for looking into this issue. I'll try to provide the fix mentioned in my previous comment yesterday 😊
@ajax-ryzhyi-r Are you using github server or cloud? Asking because we weren't able to reproduce the results your graphql query provided and we definitely don't have problems with failed
atlantis/apply
not being able to be re-applied like in original post. I'm saying we haven't seen this problem because we are the ones who made these changes and have been running them in production on github cloud for better part of the year..We're using GitHub Cloud and haven't encountered issues (or haven't noticed) with the
atlantis/apply
status check. However, we're facing problems with other required checks, specifically GitHub Actions workflow checks. If the required GitHub Actions workflow check fails and we rerun it successfully, Atlantis still treats the PR as unmergeable. I can provide the GraphQL query I used for my investigation if you'd like
@ajax-ryzhyi-r I'm unable to reproduce the behaviour of the GraphQL API you describe. When I rerun a workflow on the same commit, only the latest run appears in the response. Could you please provide me the query you used?
@ajax-ryzhyi-r Are you using github server or cloud? Asking because we weren't able to reproduce the results your graphql query provided and we definitely don't have problems with failed
atlantis/apply
not being able to be re-applied like in original post. I'm saying we haven't seen this problem because we are the ones who made these changes and have been running them in production on github cloud for better part of the year..We're using GitHub Cloud and haven't encountered issues (or haven't noticed) with the
atlantis/apply
status check. However, we're facing problems with other required checks, specifically GitHub Actions workflow checks. If the required GitHub Actions workflow check fails and we rerun it successfully, Atlantis still treats the PR as unmergeable. I can provide the GraphQL query I used for my investigation if you'd like@ajax-ryzhyi-r I'm unable to reproduce the behaviour of the GraphQL API you describe. When I rerun a workflow on the same commit, only the latest run appears in the response. Could you please provide me the query you used?
@henriklundstrom @stasostrovskyi Sorry for misleading you. Upon reviewing my query, I realized my initial understanding was incorrect. The issue occurs when the workflow is run multiple times for single commits, not when it is rerun for a single commit.
In our case, we have a workflow that validates PRs (checking title, checklists, labels, etc.) with the following triggers:
name: Validate pull request
on:
pull_request:
types: [edited, opened, reopened, synchronize]
branches:
- master
When this workflow fails, the PR author edits the PR to address the issues, and GitHub automatically triggers the workflow again, creating new run (For some reasonsI thought it reruns failed run ). After these steps, Atlantis is unable to apply the PR, reporting the error mentioned earlier.
However, the proposed fix still appears relevant. I'll only remove the part related to statusContexts
, as the issue seems to be specifically about checkRuns
(GitHub Actions workflow checks), and create a new PR with the fix for checkRuns
only.
I'm not sure our issue is the same as originally mentioned here.
Anyway, here is query I used:
gh api graphql -f query='
query($owner: String!, $name: String!, $number: Int!, $ruleCursor: String, $contextCursor: String) {
repository(owner: $owner, name: $name) {
pullRequest(number: $number) {
reviewDecision
baseRef {
branchProtectionRule {
requiredStatusChecks {
context
}
}
rules(first: 100, after: $ruleCursor) {
pageInfo {
endCursor
hasNextPage
}
nodes {
type
repositoryRuleset {
enforcement
}
parameters {
... on RequiredStatusChecksParameters {
requiredStatusChecks {
context
}
}
... on WorkflowsParameters {
workflows {
path
}
}
}
}
}
}
commits(last: 1) {
nodes {
commit {
statusCheckRollup {
contexts(first: 100, after: $contextCursor) {
pageInfo {
endCursor
hasNextPage
}
nodes {
__typename
... on CheckRun {
name
status
conclusion
detailsUrl
}
... on StatusContext {
context
state
targetUrl
}
}
}
}
}
}
}
}
}
}' -F owner='<repo_owner>' -F name='<repo_name>' -F number='<pr_number>'
@ajax-ryzhyi-r Okay I see, it happens not on rerun, but on re-trigger on the same commit. I was indeed able to reproduce it by closing and reopening a pull request without pushing any new commit.
Community Note
Overview of the Issue
Yesterday, we upgraded from v0.28.5 to v0.30.0. Afterwards, it appears Atlantis was not respecting the
ATLANTIS_GH_ALLOW_MERGEABLE_BYPASS_APPLY
if the apply previously had failed for any reason. We had PRs where someone had tried to apply and it had failed, either because they didn't regenerate a plan after the upgrade (we are not using EFS) or because of some other reason. It then became impossible to apply the PR because the previous apply had failed, leaving the PR unmergeable.We tried running
atlantis unlock
and destroying and recreating the ECS service that Atlantis was running under, and neither fixed the issue. Finally, we rolled back to v0.29.0, and this appears to have fixed the issue; PRs once again allowed us to plan and apply as normal.Reproduction Steps
Running Atlantis v0.30.0 with branch protection requiring
atlantis/apply
to be successful in order to merge. Fail an apply, then attempt to replan and apply; you should get this error:Logs
This appears to be the only log message applying to the issue:
Logs
``` {"date":1730209894.274917,"log":"{\"level\":\"error\",\"ts\":\"2024-10-29T13:51:34.274Z\",\"caller\":\"events/instrumented_project_command_runner.go:84\",\"msg\":\"Failure running apply operation: Pull request must be mergeable before running apply.\",\"json\":{\"repo\":\"REDACTED_REPO\",\"pull\":\"3866\"},\"stacktrace\":\"github.com/runatlantis/atlantis/server/events.RunAndEmitStats\\n\\tgithub.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:84\\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).Apply\\n\\tgithub.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:46\\ngithub.com/runatlantis/atlantis/server/events.runProjectCmds\\n\\tgithub.com/runatlantis/atlantis/server/events/project_command_pool_executor.go:48\\ngithub.com/runatlantis/atlantis/server/events.(*ApplyCommandRunner).Run\\n\\tgithub.com/runatlantis/atlantis/server/events/apply_command_runner.go:167\\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\\n\\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:389\"}","container_id":"98c140658909440aab128a263758c770-3286682525","container_name":"atlantis","source":"stderr","ecs_cluster":"atlantis-aws","ecs_task_arn":"arn:aws:ecs:us-east-1:150523386789:task/atlantis-aws/98c140658909440aab128a263758c770","ecs_task_definition":"atlantis-aws:23"} {"date":1730209895.469134,"container_id":"98c140658909440aab128a263758c770-3286682525","container_name":"atlantis","source":"stderr","log":"{\"level\":\"info\",\"ts\":\"2024-10-29T13:51:35.469Z\",\"caller\":\"events/automerger.go:20\",\"msg\":\"not automerging because project at dir \\\"REDACTED_PROJECT\\\", workspace \\\"default\\\" has status \\\"apply_errored\\\"\",\"json\":{\"repo\":\"REDACTED_REPO\",\"pull\":\"3866\"}}","ecs_cluster":"atlantis-aws","ecs_task_arn":"arn:aws:ecs:us-east-1:REDACTED_ACCT:task/atlantis-aws/98c140658909440aab128a263758c770","ecs_task_definition":"atlantis-aws:23"} ```Environment details
If not already included, please provide the following:
Atlantis server-side config file:
Repo
atlantis.yaml
file:This is just
version: 3
followed by our list of projects.