Closed rarkins closed 1 year ago
Seems like releases/tags result can not be ordered by updated_at
field, though default sorting looks good enough to me and smart pagination is still relevant.
The approach I'm about to try:
github-tags
ls-remote
git-refs
TAG_COMMIT_DATE
updated_at
fieldgithub-releases
Both caches are meant to reset after some prolonged period, maybe 24 hours or so.
I'd like to see the github-tags approach with ls-remote initially. Let's test and deploy that (need to make sure it works with private repos and custom GHE endpoints too)
Do we actually use isStable
flag for github-tags
datasource? Seems like we have an option to fetch releaseTimestamp
via GraphQL, though still can't access prerelease
flag on which githubRelease.isStable
is based on.
My point is that if we only need releaseTimestamp
s for tags, then hopefully we can obtain it from the commit info via GraphQL together with tag name and commit hash.
Another 3-level design:
ls-remote
Promise.all
)This is based on possibly wrong premise that we actually don't need timestamps for older items.
We want timestamps for all. And I think isStable is important for GitHub Actions, where some actions have sometimes had pre-releases published with stable semver versions
All releases have corresponding tag, but not all tags have corresponding release.
This means our current implementation doesn't guarantee releaseTimestamp
field for every tag.
We can achieve this using GraphQL:
refs(
refPrefix: "refs/tags/"
first: 10
orderBy: {field: TAG_COMMIT_DATE, direction: DESC}
) {
nodes {
version:name
target {
... on Commit {
hash: oid
releaseTimestamp: committedDate
}
}
}
}
Still need to obtain isStable
flag values from the github-releases
datasource (hopefully can optimize this too)
Initial fetching using GraphQL would have penalty on initial fetch for repos with long list of refs, but should be okay with populated cache
So we couldn't detect tag deletions this way, right? Would need a periodic full fetch with cold cache?
Yes, it's the problem for both releases and tags, so I think the remedy will be similar
Though it can be implemented as some gradual process, i.e. check and reconcile just one page of previously stored results per run. Not sure about this for now.
Does GraphQL have any "maximum 10 pages" limitation or can we use it to fetch 100 per page until we have all?
I don't think it would limit us. However, unlike REST, we have to fetch pages sequentially because of cursor-based pagination mechanics.
So using GraphQL the idea would be the following?
If no cache: fetch 100 per page sequentially until done.
If cache, and less than short term cache expiry time (e.g. 30 minutes), then use cache.
If cache, and short term cache expiry has been hit, then fetch 100 per page until some date limit is hit (e.g. one month). Merge any new data with old (including missing tags) and overwrite existing cache.
If long term cache expiry is hit (e.g. one week), then treat like "no cache" scenario?
Result being that we'd perform on average ~one page of fetching every 30 minutes compared to today when we do up to 10 pages every time the cache expires?
Sounds good to me
Does the same work for releases?
Yes, hopefully it'll share common logic
Be aware that changing such a fundamental logic in those datasources leads to non-semver compliant renovate releases. After some digging through the recent changes in this project, i found the change in #15645 to break my existing config. This is due to the implicit switch from GHs REST API to the GraphQL API for releases / tags introduced in this PR. Since my config does not have any GH Token associated with it, it now cannot fetch tags and releases anymore.
Is there any interest in providing a solution to still use tags/releases without a token? Otherwise id just like to inform you about the breaking change here, so that we can look more carefully at future changes like these, to let them cause a major version bump.
As a workaround i successfully used git-tags and provided the full git URL.
What would you like Renovate to be able to do?
Use intelligent pagination caching for popular datasources, especially github tags/releases and Docker tags. Then we can maybe remove pagination limits.
If you have any ideas on how this should be implemented, please tell us here.
It should be implemented in a similar manner to how we did caching for GitHub PRs. e.g. if it's possible to sort by a last modified field then reuse the existing cache (even if "expired") and only retrieve as much as necessary on subsequent requests.
Notes:
Is this a feature you are interested in implementing yourself?
No
Sub-issues