`wait-for-github ci` can report failures when GitHub's UI shows success

For an internal usecase (a rollout), we ran wait-for-github on https://github.com/grafana/pyroscope/commit/6e6a07351186e7b214bc72d32bbf0c6cf7257e65.

wait-for-github failed and blocked the rollout. Running it right now shows the same:

$ GITHUB_TOKEN="$(gh auth token)" \
  go run ./cmd/wait-for-github \
    ci \
    https://github.com/grafana/pyroscope/commit/6e6a07351186e7b214bc72d32bbf0c6cf7257e65
INFO[2024-11-20 09:57:00] Checking CI status on grafana/pyroscope@6e6a07351186e7b214bc72d32bbf0c6cf7257e65
2024/11/20 09:57:00 [DEBUG] GET https://api.github.com/repos/grafana/pyroscope/commits/6e6a07351186e7b214bc72d32bbf0c6cf7257e65/check-runs?filter=latest&per_page=100
CI failed. Please check CI on the following commit: https://github.com/grafana/pyroscope/commit/6e6a07351186e7b214bc72d32bbf0c6cf7257e65
exit status 1

GitHub's UI shows this as a success.

I do see that if you ask for all check runs the Dependabot one fails.

Let's take a look at what is happening. We can call the API to list the check runs, which is what wait-for-github does (as well as looking at the statuses which is OK here: there aren't any and we correctly handle that). Running:

gh api /repos/grafana/pyroscope/commits/6e6a07351186e7b214bc72d32bbf0c6cf7257e65/check-runs --jq '
.check_runs[] | 
{
 name,
 status,
 conclusion
} | 
select(
 .status != "completed" or 
 (
   .conclusion as $c | 
   ["success", "skipped"] | 
   index($c) | 
   not
 )
)'

gives:

{
  "conclusion": "failure",
  "name": "Dependabot",
  "status": "completed"
}

But thinking about what Dependabot is, it's not something you run from Actions directly - GitHub runs it for you. I don't know why they expose it as a check run but it doesn't seem like we should be considering that the commit has failed due to it. The output also doesn't appear in the list of check runs attached to the commit in the UI.

Basically we want to replicate the ✔/❌ that GitHub shows you on a commit.

I had a deeper look around on the interwebs and I found a StatusCheckRollup field on the GraphQL API. What we're doing in wait-for-github at the minute is trying to replicate this using the REST API and we haven't been able to get it quite right yet. (I just started a discussion asking if this can be made easier.)

Right now it seems to me like using that method from the GraphQL API is going to be our best path forward to get reliable results.

For that we'd use https://github.com/shurcooL/graphql (making sure to give it the same caching retryable transport we already give to the REST client), constructing a query something like this:

$ gh api graphql \
        --raw-field owner=grafana \
        --raw-field repository=pyroscope \
        --raw-field commit=6e6a07351186e7b214bc72d32bbf0c6cf7257e65 \
        --raw-field query=$'
query ($owner: String!, $repository: String!, $commit: String!) {
  repository(owner: $owner, name: $repository) {
    object(expression: $commit) {
      ... on Commit {
        statusCheckRollup {
          state
        }
      }
    }
  }
}'

which gives:

{
  "data": {
    "repository": {
      "object": {
        "statusCheckRollup": {
          "state": "SUCCESS"
        }
      }
    }
  }
}

There are other fields on that object too which might be interesting - you can select the relevant statuses and checks so we could say which ones are causing us to fail (see the docs, and also the GraphQL Explorer is very useful for playing).

For whoever works on this, I suggest checking the results of this query in a variety of situations such as:

combinations of statuses [Drone for us] and checks [actions]
situations where we have no statuses and no checks
when they are failing and not failing, skipped and so on
what happens if we check super fast before anything gets created (could be hard to time this one) - would we report a false positive, i.e. say CI was successful when really it was just waiting to be created

And if it all looks good / can be solved then this would be a good API to use!

grafana / wait-for-github

`wait-for-github ci` can report failures when GitHub's UI shows success #181