Open YenTheFirst opened 5 years ago
On further investigation, I believe this can be mitigated by caching github API responses in the persistent Check container.
Hi @YenTheFirst and thanks for #143 , sorry I've taken so long to respond.
[...] every instance of a check of this resource will result in a very large number of V3 API calls. These calls seem to specifically be related to checking for modified files [...]
This is correct. When using paths
and ignore_paths
each discovered version (new/latest commit to an open PR) will require at least N V3 API calls where N >= 1 (its one API call per 100th changed file for pagination).
Unfortunately I don't think there are any good ways to remedy this either. Unless Github's GraphQL API allows us to filter the PRs by modified files directly, we'll always need to get a complete list of the changed files in order to do the filtering on our end. Even if we use the PullRequestChangedFile connection for the PullRequest object in the GraphQL API, the cost will be the same, it will just count against a different rate limit (so arguably it's better to keep it split).
That being said, I don't think caching is a good solution either, at least not in the resource itself - perhaps it's possible to write a separate task to do that particular filtering? 🤔
It looks like caching is how the jtarchie version of this resource handled this problem, and adding caching seems to have entirely resolved the issue for us. we're not using webhooks, though.
was there some poor interaction between caching and webhooks that motivated this resource to not use caching?
On Wed, Sep 4, 2019, 05:22 Kristian notifications@github.com wrote:
Hi @YenTheFirst https://github.com/YenTheFirst and thanks for #143 https://github.com/telia-oss/github-pr-resource/pull/143 , sorry I've taken so long to respond.
[...] every instance of a check of this resource will result in a very large number of V3 API calls. These calls seem to specifically be related to checking for modified files [...]
This is correct. When using paths and ignore_paths each discovered version (new/latest commit to an open PR) will require at least N V3 API calls where N >= 1 (its one API call per 100th changed file for pagination).
Unfortunately I don't think there are any good ways to remedy this either. Unless Github's GraphQL API allows us to filter the PRs by modified files directly, we'll always need to get a complete list of the changed files in order to do the filtering on our end. Even if we use the PullRequestChangedFile https://developer.github.com/v4/object/pullrequestchangedfile/ connection for the PullRequest https://developer.github.com/v4/object/pullrequest/ object in the GraphQL API, the cost will be the same, it will just count against a different rate limit (so arguably it's better to keep it split).
That being said, I don't think caching is a good solution either, at least not in the resource itself - perhaps it's possible to write a separate task to do that particular filtering? 🤔
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/telia-oss/github-pr-resource/issues/142?email_source=notifications&email_token=AADMPY26QJONBS7IIR2XEQ3QH6R7TA5CNFSM4IN55BT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD53MCII#issuecomment-527876385, or mute the thread https://github.com/notifications/unsubscribe-auth/AADMPY57HV63OEMQQGYDBLDQH6R7TANCNFSM4IN55BTQ .
Given (1) a github repo with two files:
and (2) a resource that is configured to emit new versions of PRs that modify rarely_updated_file.txt:
and (3) a large number (30+) of open PRs affecting
frequently_updated_file.txt
, but notrarely_updated_file.txt
,then, every instance of a
check
of this resource will result in a very large number of V3 API calls. These calls seem to specifically be related to checking for modified files, https://github.com/telia-oss/github-pr-resource/blob/master/github.go#L181. Adding a debug line after this response indicates theresponse.Rate
decrementing multiple times percheck
We've run in to this issue when trying to use this resource-type on several pipelines for projects that live in a monorepo, which use
paths:
to limit to PRs that affect a certain pipeline's project. Rarely updated projects end up continually re-polling every unrelated new PR to check for modifications.This seems thorny, and I'm not sure how to work around it, short of emitting occasional "fake" versions that the downstream pipeline would need to check for and ignore. It seems like a problem with Concourse's resource API design?