sorenlouv / backport

A simple CLI tool that automates the process of backporting commits on a GitHub repo
https://github.com/sqren/backport/blob/main/docs/config-file-options.md
Apache License 2.0
241 stars 57 forks source link

Query `CommitsByAuthor` gets 502 error in Github response when using large `maxNumber` #479

Closed ian-h-chamberlain closed 8 months ago

ian-h-chamberlain commented 8 months ago

I was trying this tool out for the first time and I seem to have found a case that can pretty easily trigger a timeout from Github's servers when querying for a conflict. Unfortunately it's on a private repo so I can't share all the details, but I've managed to track down what I think is the cause. Here is the problematic query, from the logs, which I think is only used when there is a merge conflict:

{
  "query": "query CommitsByAuthor($authorId: ID, $commitPath: String, $dateSince: GitTimestamp, $dateUntil: GitTimestamp, $maxNumber: Int!, $repoName: String!, $repoOwner: String!, $sourceBranch: String!) {\n  repository(owner: $repoOwner, name: $repoName) {\n    ref(qualifiedName: $sourceBranch) {\n      target {\n        ... on Commit {\n          history(\n            first: $maxNumber\n            author: {id: $authorId}\n            path: $commitPath\n            since: $dateSince\n            until: $dateUntil\n          ) {\n            edges {\n              node {\n                ...SourceCommitWithTargetPullRequestFragment\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n}\n\nfragment SourceCommitWithTargetPullRequestFragment on Commit {\n  repository {\n    name\n    owner {\n      login\n    }\n  }\n  sha: oid\n  message\n  committedDate\n  author {\n    name\n    email\n  }\n  associatedPullRequests(first: 1) {\n    edges {\n      node {\n        title\n        url\n        number\n        labels(first: 50) {\n          nodes {\n            name\n          }\n        }\n        baseRefName\n        mergeCommit {\n          ...RemoteConfigHistoryFragment\n          sha: oid\n          message\n        }\n        timelineItems(last: 20, itemTypes: CROSS_REFERENCED_EVENT) {\n          edges {\n            node {\n              ... on CrossReferencedEvent {\n                targetPullRequest: source {\n                  __typename\n                  ... on PullRequest {\n                    targetMergeCommit: mergeCommit {\n                      sha: oid\n                      message\n                    }\n                    repository {\n                      name\n                      owner {\n                        login\n                      }\n                    }\n                    url\n                    title\n                    state\n                    baseRefName\n                    number\n                    commits(first: 20) {\n                      edges {\n                        node {\n                          targetCommit: commit {\n                            message\n                            sha: oid\n                          }\n                        }\n                      }\n                    }\n                  }\n                }\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n}\n\nfragment RemoteConfigHistoryFragment on Commit {\n  remoteConfigHistory: history(first: 1, path: \".backportrc.json\") {\n    edges {\n      remoteConfig: node {\n        committedDate\n        file(path: \".backportrc.json\") {\n          ... on TreeEntry {\n            object {\n              ... on Blob {\n                text\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n}",
  "variables": {
    "repoOwner": "[redacted]",
    "repoName": "[redacted]",
    "sourceBranch": "develop",
    "maxNumber": 50,
    "authorId": null,
    "commitPath": "[redacted]",
    "dateSince": null,
    "dateUntil": "2023-10-13T01:54:32Z"
  }
}

and the response:

{
  "data": {
    "data": null,
    "errors": [
      {
        "message": "Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `0B60:297D:1890481:32291DB:652961CA` when reporting this issue."
      }
    ]
  },
  "headers": {
    "access-control-allow-origin": "*",
    "access-control-expose-headers": "ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset",
    "content-type": "application/json",
    "date": "Fri, 13 Oct 2023 15:27:18 GMT",
    "server": "GitHub.com",
    "transfer-encoding": "chunked",
    "vary": "Accept-Encoding, Accept, X-Requested-With",
    "x-github-request-id": "0B60:297D:1890481:32291DB:652961CA"
  },
  "status": 502,
  "statusText": "Bad Gateway"
}

From reading a few sources online, it seems like this is mostly caused by the response taking too long (> 10s) and Github servers timing out. There might be a few things that could help address this:

sorenlouv commented 8 months ago

Thank you for the thorough writeup. I agree, this could be handled much more smoothly.

Respect dateSince in this particular query. Even if set from the CLI or config file, this query still uses null which can result in a large response. I'm not sure if this might impact user experience if some commits got omitted because of using dateSince but I think that's sort of expected

I was not able to reproduce this. I run this command, and I see the commits are being filtered as expected:

backport --author sqren --repo elastic/kibana --maxNumber 100 --date-since "10 Oct 2023"

Can you try re-running the above command and see if you get all commits or only commits after 10th of October.

ian-h-chamberlain commented 8 months ago

Ah, thanks for the example, that helps me reproduce it more easily without using private repos. The filtering via the command you mentioned seems to work correctly, but that wasn't the original problem I ran into.

The issue seems to be occur when there is a merge conflict while backporting, so I was able to trigger by backporting a recent PR of yours to an older branch:

$ backport --dryRun --author sqren --repo elastic/kibana --maxNumber 100 --date-since "10 Oct 2023" --pr 167828
? Select pull request [APM] Add API test for multiple APM Servers (#167828)
? Select branch 5.6
✔ 100% Cloning repository from github.com (one-time operation)

Backporting to 5.6:
✔ Pulling latest changes
✖ Cherry-picking: [APM] Add API test for multiple APM Servers (#167828)
An unhandled error occurred while backporting commit. Please see the logs for details: /Users/ichamberlain/.backport/backport.info.log

Here's the last few lines of the full log, which I think includes the query that's requesting too much data:

{"level":"info","message":"POST https://api.github.com/graphql (name:CommitsByAuthor, status: 502, EXCEPTION THROWN)","metadata":{},"timestamp":"2023-10-16 13:33:12"}
{"level":"error","message":"runSequentially failed Request failed with status code 502 (Github API v4)","metadata":{"githubResponse":{"config":{"adapter":["xhr","http"],"data":"{\"query\":\"query CommitsByAuthor($authorId: ID, $commitPath: String, $dateSince: GitTimestamp, $dateUntil: GitTimestamp, $maxNumber: Int!, $repoName: String!, $repoOwner: String!, $sourceBranch: String!) {\\n  repository(owner: $repoOwner, name: $repoName) {\\n    ref(qualifiedName: $sourceBranch) {\\n      target {\\n        ... on Commit {\\n          history(\\n            first: $maxNumber\\n            author: {id: $authorId}\\n            path: $commitPath\\n            since: $dateSince\\n            until: $dateUntil\\n          ) {\\n            edges {\\n              node {\\n                ...SourceCommitWithTargetPullRequestFragment\\n              }\\n            }\\n          }\\n        }\\n      }\\n    }\\n  }\\n}\\n\\nfragment SourceCommitWithTargetPullRequestFragment on Commit {\\n  repository {\\n    name\\n    owner {\\n      login\\n    }\\n  }\\n  sha: oid\\n  message\\n  committedDate\\n  author {\\n    name\\n    email\\n  }\\n  associatedPullRequests(first: 1) {\\n    edges {\\n      node {\\n        title\\n        url\\n        number\\n        labels(first: 50) {\\n          nodes {\\n            name\\n          }\\n        }\\n        baseRefName\\n        mergeCommit {\\n          ...RemoteConfigHistoryFragment\\n          sha: oid\\n          message\\n        }\\n        timelineItems(last: 20, itemTypes: CROSS_REFERENCED_EVENT) {\\n          edges {\\n            node {\\n              ... on CrossReferencedEvent {\\n                targetPullRequest: source {\\n                  __typename\\n                  ... on PullRequest {\\n                    targetMergeCommit: mergeCommit {\\n                      sha: oid\\n                      message\\n                    }\\n                    repository {\\n                      name\\n                      owner {\\n                        login\\n                      }\\n                    }\\n                    url\\n                    title\\n                    state\\n                    baseRefName\\n                    number\\n                    commits(first: 20) {\\n                      edges {\\n                        node {\\n                          targetCommit: commit {\\n                            message\\n                            sha: oid\\n                          }\\n                        }\\n                      }\\n                    }\\n                  }\\n                }\\n              }\\n            }\\n          }\\n        }\\n      }\\n    }\\n  }\\n}\\n\\nfragment RemoteConfigHistoryFragment on Commit {\\n  remoteConfigHistory: history(first: 1, path: \\\".backportrc.json\\\") {\\n    edges {\\n      remoteConfig: node {\\n        committedDate\\n        file(path: \\\".backportrc.json\\\") {\\n          ... on TreeEntry {\\n            object {\\n              ... on Blob {\\n                text\\n              }\\n            }\\n          }\\n        }\\n      }\\n    }\\n  }\\n}\",\"variables\":{\"repoOwner\":\"elastic\",\"repoName\":\"kibana\",\"sourceBranch\":\"main\",\"maxNumber\":100,\"authorId\":null,\"commitPath\":null,\"dateSince\":null,\"dateUntil\":\"2023-10-13T08:24:23Z\"}}","env":{},"headers":{"Accept":"application/json, text/plain, */*","Accept-Encoding":"gzip, compress, deflate, br","Authorization":"bearer <REDACTED>","Content-Length":"3034","Content-Type":"application/json","User-Agent":"axios/1.5.1"},"maxBodyLength":-1,"maxContentLength":-1,"method":"post","timeout":0,"transformRequest":[null],"transformResponse":[null],"transitional":{"clarifyTimeoutError":false,"forcedJSONParsing":true,"silentJSONParsing":true},"url":"https://api.github.com/graphql","xsrfCookieName":"XSRF-TOKEN","xsrfHeaderName":"X-XSRF-TOKEN"},"data":{"data":null,"errors":[{"message":"Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `C318:5862:1A4A0BC:35D9657:652D73C9` when reporting this issue."}]},"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset","content-type":"application/json","date":"Mon, 16 Oct 2023 17:33:12 GMT","server":"GitHub.com","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-github-request-id":"C318:5862:1A4A0BC:35D9657:652D73C9"},"status":502,"statusText":"Bad Gateway"},"name":"GithubV4Exception","stack":"GithubV4Exception: Request failed with status code 502 (Github API v4)"},"timestamp":"2023-10-16 13:33:12"}
{"level":"info","message":"Results","metadata":{"0":{"error":{"githubResponse":{"config":{"adapter":["xhr","http"],"data":"{\"query\":\"query CommitsByAuthor($authorId: ID, $commitPath: String, $dateSince: GitTimestamp, $dateUntil: GitTimestamp, $maxNumber: Int!, $repoName: String!, $repoOwner: String!, $sourceBranch: String!) {\\n  repository(owner: $repoOwner, name: $repoName) {\\n    ref(qualifiedName: $sourceBranch) {\\n      target {\\n        ... on Commit {\\n          history(\\n            first: $maxNumber\\n            author: {id: $authorId}\\n            path: $commitPath\\n            since: $dateSince\\n            until: $dateUntil\\n          ) {\\n            edges {\\n              node {\\n                ...SourceCommitWithTargetPullRequestFragment\\n              }\\n            }\\n          }\\n        }\\n      }\\n    }\\n  }\\n}\\n\\nfragment SourceCommitWithTargetPullRequestFragment on Commit {\\n  repository {\\n    name\\n    owner {\\n      login\\n    }\\n  }\\n  sha: oid\\n  message\\n  committedDate\\n  author {\\n    name\\n    email\\n  }\\n  associatedPullRequests(first: 1) {\\n    edges {\\n      node {\\n        title\\n        url\\n        number\\n        labels(first: 50) {\\n          nodes {\\n            name\\n          }\\n        }\\n        baseRefName\\n        mergeCommit {\\n          ...RemoteConfigHistoryFragment\\n          sha: oid\\n          message\\n        }\\n        timelineItems(last: 20, itemTypes: CROSS_REFERENCED_EVENT) {\\n          edges {\\n            node {\\n              ... on CrossReferencedEvent {\\n                targetPullRequest: source {\\n                  __typename\\n                  ... on PullRequest {\\n                    targetMergeCommit: mergeCommit {\\n                      sha: oid\\n                      message\\n                    }\\n                    repository {\\n                      name\\n                      owner {\\n                        login\\n                      }\\n                    }\\n                    url\\n                    title\\n                    state\\n                    baseRefName\\n                    number\\n                    commits(first: 20) {\\n                      edges {\\n                        node {\\n                          targetCommit: commit {\\n                            message\\n                            sha: oid\\n                          }\\n                        }\\n                      }\\n                    }\\n                  }\\n                }\\n              }\\n            }\\n          }\\n        }\\n      }\\n    }\\n  }\\n}\\n\\nfragment RemoteConfigHistoryFragment on Commit {\\n  remoteConfigHistory: history(first: 1, path: \\\".backportrc.json\\\") {\\n    edges {\\n      remoteConfig: node {\\n        committedDate\\n        file(path: \\\".backportrc.json\\\") {\\n          ... on TreeEntry {\\n            object {\\n              ... on Blob {\\n                text\\n              }\\n            }\\n          }\\n        }\\n      }\\n    }\\n  }\\n}\",\"variables\":{\"repoOwner\":\"elastic\",\"repoName\":\"kibana\",\"sourceBranch\":\"main\",\"maxNumber\":100,\"authorId\":null,\"commitPath\":null,\"dateSince\":null,\"dateUntil\":\"2023-10-13T08:24:23Z\"}}","env":{},"headers":{"Accept":"application/json, text/plain, */*","Accept-Encoding":"gzip, compress, deflate, br","Authorization":"bearer <REDACTED>","Content-Length":"3034","Content-Type":"application/json","User-Agent":"axios/1.5.1"},"maxBodyLength":-1,"maxContentLength":-1,"method":"post","timeout":0,"transformRequest":[null],"transformResponse":[null],"transitional":{"clarifyTimeoutError":false,"forcedJSONParsing":true,"silentJSONParsing":true},"url":"https://api.github.com/graphql","xsrfCookieName":"XSRF-TOKEN","xsrfHeaderName":"X-XSRF-TOKEN"},"data":{"data":null,"errors":[{"message":"Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `C318:5862:1A4A0BC:35D9657:652D73C9` when reporting this issue."}]},"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset","content-type":"application/json","date":"Mon, 16 Oct 2023 17:33:12 GMT","server":"GitHub.com","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-github-request-id":"C318:5862:1A4A0BC:35D9657:652D73C9"},"status":502,"statusText":"Bad Gateway"},"name":"GithubV4Exception"},"status":"unhandled-error","targetBranch":"5.6"}},"timestamp":"2023-10-16 13:33:12"}

Setting --maxNumber 20 seems to reduce the size of the query enough for it to work, but the precise number probably depends on the server load, size of the query contents, etc.

Hopefully that helps with the details — if you need more info, let me know and I can provide full logs from a different computer (assuming I can reproduce there as well). Thanks!

sorenlouv commented 8 months ago

I've published some improvements in 9.4.1. Would be great if you can give it a try, thanks!

ian-h-chamberlain commented 8 months ago

Thanks for the quick fix! It looks like your changes helped, I can no longer reproduce with the original command that I first ran into problems on.

sorenlouv commented 8 months ago

You are welcome. FYI: I added this:

https://github.com/sqren/backport/blob/993d2801b032d5321bca5c47bb3d8c853671a500/src/lib/cherrypickAndCreateTargetPullRequest/getCommitsWithoutBackports.ts#L30

When encountering a conflict the tool automatically retrieves related commits to figure out if one of them caused the conflict. In this case it was using the same maxCommit value as specified via the CLI. I've changed it to a hardcoded value of 50