renovatebot / renovate

Home of the Renovate CLI: Cross-platform Dependency Automation by Mend.io
https://mend.io/renovate
GNU Affero General Public License v3.0
17.44k stars 2.29k forks source link

Permanent caching for GitHub issues and PRs #27641

Closed zharinov closed 6 months ago

zharinov commented 7 months ago

Describe the proposed change(s).

It seems like we have solution for reliable caching:

rarkins commented 7 months ago

This would mean that we use the issues endpoint to fetch both issues and pulls. Some things to consider:

To handle the scenario where there' a huge number of old pulls, and possibly tripping the rate limit, we could do this when initially populating the cache:

zharinov commented 7 months ago

Is that correct that only issue Renovate interested in is own Dependency Dashboard only? I.e. we fetch all the issues only to search for this one?

rarkins commented 7 months ago

We care about any issue we've created, which I think is limited to Dependency Dashboard and config warnings. And we should be filtering based on creator, which means created by Renovate.

zharinov commented 7 months ago

Looks like it's the small fraction of all items, compared to PRs. So probably it's not a big issue (pun intended) to additionally verify, right before return, each findIssue() result by performing GET request (with If-None-Match: <ETag> header).

UPD. And yes, ETags returned with POST/PATCH requests won't work with GET/HEAD

rarkins commented 7 months ago

We have gitIgnoredAuthors for if the git committer changes (including a bot rename), but that's not directly relevant here. The one which is relevant is ignorePrAuthor which is a big different - instead of an explicit list like in gitIgnoredAuthors it the matches anything. Today for GitHub that setting means we don't filter PRs by username and fetch all. I wonder if we should deprecate remove that setting and instead require a list of other usernames to be more efficient (IF it's more efficient - maybe needs a query per username).

zharinov commented 7 months ago

Actually, we can't use /issues endpoint as the single source of truth for the cached PRs. The reason is simple: it doesn't contain all the fields we need, so we still have to reach /pulls endpoint.

zharinov commented 7 months ago

The best we probably could do is to query the latest updated issue during the platform init, and infer the "dirtiness" of the cached issues from it.

zharinov commented 7 months ago

Another option:

rarkins commented 7 months ago

No since param supported here: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests

So I guess we keep the getPrList() function as-is using /pulls REST API, using If-Modified-Since header and hopefully getting plenty of 304s.

For Issues, we could consider fetching them as part of the initRepo graphql query, I think the most we'd need is maybe 2-4 sorted by recently modified. Normal case is we'd have one dependency dashboard open, and zero or one closed config warning issues.

zharinov commented 7 months ago

I don't think it will be plenty of 304, I've just checked it worked zero times out of 10–20 attempts

rarkins commented 7 months ago

I get a much higher success rate, for example got a 304 just now for a repo I last ran a few days ago

renovate-release commented 6 months ago

:tada: This issue has been resolved in version 37.289.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: