Closed jamietanna closed 5 months ago
if you want to skip repos via API, add a renovate.json with enabled=false
https://docs.renovatebot.com/self-hosted-configuration/#optimizefordisabled
Interesting - would this still require each repo to be "onboarded" with a renovate.json
, just with enabled: false
? Is there a way to get the Renovate Runner to default this behaviour, to avoid onboarding repos?
This behaviour could probably be tweaked to behave differently when onboarding=false and requireConfig=optional like you have it. As long as you are ok with using renovate.json as the file name
Nice, so when I run:
env RENOVATE_REQUIRE_CONFIG=required RENOVATE_ONBOARDING=false renovate --token $GITHUB_TOKEN --autodiscover-filter 'jamietanna/*' --autodiscover --optimize-for-disabled
I get:
DEBUG: GET https://api.github.com/repos/jamietanna/jamietanna/contents/renovate.json = (code=ERR_NON_2XX_3XX_RESPONSE, statusCode=404 retryCount=0, duration=285) (repository=jamietanna/jamietanna)
DEBUG: Resetting npmrc (repository=jamietanna/jamietanna)
DEBUG: detectSemanticCommits() (repository=jamietanna/jamietanna)
DEBUG: Initializing git repository into /tmp/renovate-cache-1667898864/repos/github/jamietanna/jamietanna (repository=jamietanna/jamietanna)
DEBUG: Performing blobless clone (repository=jamietanna/jamietanna)
DEBUG: git clone completed (repository=jamietanna/jamietanna)
"durationMs": 1369
DEBUG: latest repository commit (repository=jamietanna/jamietanna)
"latestCommit": {
"hash": "ffb5b962a9ce6518737fa3a3774ce08679ae7f3e",
"date": "2022-11-08T09:12:33+00:00",
"message": "Automagic update",
"refs": "HEAD -> main, origin/main, origin/HEAD",
"body": "",
"author_name": "README-bot",
"author_email": "actions@users.noreply.github.com"
}
Maybe I'm misunderstanding, but I'd expect that in this case, we wouldn't clone after seeing there's no renovate.json
, or are you indicating that there are some things we should tweak in the codebase for this usecase?
renovate clones, because it needs to check some more possible config files https://github.com/renovatebot/renovate/blob/d869c946d164965393f1259db5171e2a0303fb27/lib/config/app-strings.ts#L1-L11
so you should enable repo cache and save <cacheDir>/renovate/repository/
bewteen runs. then renovate should know there is no config and don't clone until there is a new commit
Gotcha - do we think it'd be worth an enhancement to do all of those checks via API calls, instead of cloning to check?
So when running:
env RENOVATE_REPOSITORY_CACHE=enabled LOG_LEVEL=debug RENOVATE_REQUIRE_CONFIG=required RENOVATE_ONBOARDING=false renovate --token $GITHUB_TOKEN --autodiscover-filter 'jamietanna/*' --autodiscover --optimize-for-disabled --base-dir /tmp/renovate-base-new --cache-dir /tmp/renovate-cache-new --dry-run
The result of baseDir
and cacheDir
is that they're empty, which I guess is expected as there's no dependency extraction performed, because they're skipped as they're disabled, but if we made it RENOVATE_REQUIRE_CONFIG=optional
, that'd work?
Looking at a significantly larger project, when running:
env RENOVATE_REPOSITORY_CACHE=enabled LOG_LEVEL=debug RENOVATE_REQUIRE_CONFIG=optional RENOVATE_ONBOARDING=false renovate --token $GITHUB_TOKEN --autodiscover-filter '...' --autodiscover --optimize-for-disabled --base-dir /tmp/renovate-base-new --cache-dir /tmp/renovate-cache-new --dry-run
Then the result of /tmp/renovate-cache-new/renovate/repository
is only 28K, which is quite nice compared to the size of the repo! I'll try re-running it a few times today and see how it responds with the repo having commits pushed to it
I was meaning we could tweak optimizeForDisabled
I'm not sure repoCache helps here because we'd still need to clone any time there's a new commit to make sure no config files. I think the optimization can work as long as you're happy to use only renovate.json
ok. so we can assume disabled if onboarding is false, required config is true and we get a 404 for the default config file (renovate.json
, optionally changed by onboarding file name for self-hosted)
Nice, yeah that'd work ๐
In our current setup, it wouldn't save us any time, as we're having to pre-filter the organisations' repos and split them into groups of 100, because we're hitting https://github.com/renovatebot/github-action/issues/648 with our GitHub App authentication
I am hitting the same issue. We have hundreds of internal projects and cloning them all takes a very long time.
The relevant part of my config looks like this:
autodiscover: true
optimizeForDisabled: true
requireConfig: 'required'
onboarding: false
The documentation of optimizeForDisabled
says:
If the file exists and the config is disabled, Renovate will skip the repo without cloning it. Otherwise, it will continue as normal.
I think this can be improved by also skipping a repo when requireConfig: 'required'
and the renovate.json
is missing (or whatever filename is configured by onboardingConfigFileName
), exactly as suggested by @viceice .
I am mainly a Java developer, so I unfortunately cannot provide a PR for this. But anyway, would a PR that implements this proposal even be considered for merging? Or are there any potential issues with this that I am overlooking?
The code in question seems to be https://github.com/renovatebot/renovate/blob/34d26700cf32ff7a32cdf93179773b10db75ec0a/lib/workers/repository/init/apis.ts#L28
The heavy lifting seems to be already done. I guess all that needs to be done is to slightly extend the method to include an additional check, like so:
async function validateOptimizeForDisabled(
config: RenovateConfig
): Promise<void> {
if (config.optimizeForDisabled) {
const renovateConfig = await getJsonFile(defaultConfigFile(config));
if (renovateConfig == null && !config.onboarding && config.requireConfig == 'required') {
throw new Error(REPOSITORY_NO_CONFIG);
}
if (renovateConfig?.enabled === false) {
throw new Error(REPOSITORY_DISABLED_BY_CONFIG);
}
}
}
This seems simple enough, even for me. Should I try to submit a PR for that?
sure, but probably needs at least a new test for code coverage.
I don't know how to write unit tests in TypeScript, but I could probably use an existing one as a template.
If anyone else wants to do it, feel free not to wait for me!
So we are adding a special case, where all these are true?
In this case the repo must use renovate.json
and not one of the other file names. If the renovate.json
is missing, or includes enabled=false, then the repo will be skipped.
This should be rushed for v35 or wait for v36, as it's technically a breaking change
@Twith2Sugars is this one you may be able to help contribute?
@rarkins Could you please explain why this is a breaking change in your eyes? I cannot imagine an example where this is anything else than a performance optimization.
Also, when optimizeForDisabled=true
, the code currently already only looks for a single file: Either config.onboardingConfigFilename
(if configured) or configFileNames[0]
(which is surely renovate.json
). So nothing would change in that regard.
It's a breaking change because if someone has a repo with a config file like renovate.json5
today, it will work. But after this change, it will be skipped
@rarkins Thanks, that's right! I didn't see that. In other words, currently optimizeForDisabled
is indeed always only a performance optimization but will never trigger a change in observable behavior. The implementation proposed here would change that.
In this case, I am totally fine with delaying this until one of the next major releases. Not only because it is breaking, but also to give us more time to think about it and maybe come up with a better solution.
Maybe this can made less breaking by only skipping a repo if all of the criteria mentioned in https://github.com/renovatebot/renovate/issues/18811#issuecomment-1449402620 are true AND also only if onboardingConfigFilename
has been explicitly configured?
I guess that the combination of onboarding: false
and onboardingConfigFilename: "filename"
is extremely rare. Why would someone go out of their way to configure an onboardingConfigFilename
if onboarding
is disabled? Of course, cases like these could happen if onboarding
was originally true
but has been set to false
later, so it's still not bullet-proof.
Anyway: I think it is reasonable to only perform this optimization if onboardingConfigFilename
has been explicitly configured (even though we are now using this setting for something that it wasn't originally made for, and the name of the setting is slightly confusing in this context).
Either way it's breaking, so I wouldn't go too far out of your way to accommodate it. The combination of configuration options is already specific enough that I think it's ok to say "if you use optimizeForDisabled=true and onboarding=false, you must use renovate.json
in repos you wish to enable".
Agreed, with the exception that I wouldn't hardcode renovate.json
as the filename. optimizeForDisabled
is already considering onboardingConfigFilename
for the filename (if configured) and I see no reason to change that.
I tried enabling the same configuration as specified in https://github.com/renovatebot/renovate/issues/18811#issuecomment-1448936211 today to speed up our executions as we have thousands of repositories taking hours to run, so this would be an incredibly welcomed (and timely) change.
We are just now rolling out Renovate internally so we only have dozens of repositories with it enabled at this time.
@jamietanna Do you like to provide a PR for v36? Should be done in next weeks of cause ๐
I'd be up for this, but may not get to it for a week or so so if anyone else on this issue is interested in claiming it, go ahead!
OK, will add it to the planning. We can drop it if it's not ready ๐
Recent versions of Renovate include an onboarding cache:
I may be missing something, but can this cache be extended to severely improve the situation for non-active repos when onboarding: false
? If we know that a repo does not contain a renovate config and the revision of the default branch has not changed since the last run of renovate, then there is no need to clone again, right?
This wouldn't not even be breaking, in contrast to the changes suggested above.
It could in theory yes
Recent versions of Renovate include an onboarding cache:
I may be missing something, but can this cache be extended to severely improve the situation for non-active repos when
onboarding: false
? If we know that a repo does not contain a renovate config and the revision of the default branch has not changed since the last run of renovate, then there is no need to clone again, right?This wouldn't not even be breaking, in contrast to the changes suggested above.
The cache is for onboarding branch. In your case there is no onboarding branch so this wouldn't work as it is.
I've implemented a workaround that works extremely well for us. Since it may be useful for others, I've decided to share it here.
My workaround makes use of the fact that the configuration can be a script file (/usr/src/app/config.js
) that can export an async function as described in https://github.com/renovatebot/renovate/pull/13075 . My idea was to disable "autodiscovery" and to instead dynamically construct the repositories
array with projects that happen to contain a renovate.json
in their default branch.
TL;DR, this is my config.js
:
const gitlabApiEndpoint = 'https://our.company.gitlab/api/v4'
const {logger: logger} = require('renovate/dist/logger');
const gitlab = require('renovate/dist/util/http/gitlab');
gitlab.setBaseUrl(gitlabApiEndpoint);
const {default: PQueue} = require('p-queue');
const httpOpts = {
paginate: true,
token: process.env.RENOVATE_TOKEN
};
const http = new gitlab.GitlabHttp('gitlab', httpOpts);
async function toNullIfRenovateDisabled(project) {
const pId = project.id;
const ref = project.default_branch;
try {
await http.head(`projects/${pId}/repository/files/renovate.json?ref=${ref}`);
return project;
} catch (err) {
if (err.statusCode === 404 || err.statusCode === 403) {
return null;
}
throw new Error("Couldn't retrieve renovate.json", { cause: err });
}
}
async function retrieveProjects() {
try {
const response = await http.getJson(`projects?per_page=100&archived=false&membership=true&min_access_level=30`, httpOpts);
return response.body.filter(p => !p.mirror && !p.marked_for_deletion_on);
} catch (err) {
throw new Error("Couldn't retrieve projects", { cause: err });
}
}
function toRenovateRepositories(projects) {
return projects.map(p => p.path_with_namespace);
}
async function getRepositories() {
const projects = await retrieveProjects();
const enabledProjects = [];
const queue = new PQueue({concurrency: 10});
for (const p of projects) {
queue.add(() => toNullIfRenovateDisabled(p).then((p) => {
if (p) {
enabledProjects.push(p);
}
}));
}
await queue.onIdle();
return toRenovateRepositories(enabledProjects);
}
module.exports = async function() {
const enabledRepos = await getRepositories();
logger.info({ repositories: enabledRepos }, 'Detected repositories:');
return {
platform: 'gitlab',
endpoint: gitlabApiEndpoint,
autodiscover: false,
requireConfig: 'optional', // performance opt: Skip checking for closed PRs
onboarding: false,
repositories: enabledRepos,
repositoryCache: 'enabled' // experimental flag
};
}
Disclaimer and caveats:
Anyway, this cuts the time of a Renovate execution by more than 70% for us, saves tons of traffic and surely reduces the load on our Gitlab instance. I am very happy with this result.
Thanks @ChristianCiach Based on your example I did an example for Azure DevOps
const azure = require('azure-devops-node-api');
const {default: PQueue} = require('p-queue');
const fs = require('fs');
const authHandler = azure.getHandlerFromToken(process.env.RENOVATE_TOKEN, true);
const azureApi = new azure.WebApi(process.env.ENDPOINT, authHandler);
async function toNullIfRenovateDisabled(project) {
const azureApiGit = await azureApi.getGitApi();
const pId = project.id;
const projectName = project.project.name;
try {
const result = await azureApiGit.getItem(pId, "renovate.json", projectName);
if (result == null)
{
return null;
}
return project;
} catch (err) {
console.info({project: project, err: err}, 'toNullIfRenovateDisabled: Functions error');
throw new Error("Couldn't retrieve renovate.json", { cause: err });
}
}
async function getRepositories() {
const azureApiCore = await azureApi.getCoreApi();
const projects = await azureApiCore.getProjects();
const enabledProjects = [];
const queue = new PQueue({concurrency: 10});
const azureApiGit = await azureApi.getGitApi();
for (const project of projects) {
const repos = await azureApiGit.getRepositories(project.name);
for (const repo of repos)
{
queue.add(() => toNullIfRenovateDisabled(repo).then((p) => {
if (p) {
enabledProjects.push(p.project.name+ "/"+p.name);
}
}));
}
}
await queue.onIdle();
return enabledProjects;
}
Hi, this issue got quite long with 33 comments, making it hard to know what the "real" feature to implement is. If anyone still wants this, please suggest a feature in Discussions with a concise summary of desired behavior, and once it's settled then we'll create a new Feature Request issue.
What would you like Renovate to be able to do?
Related to https://github.com/renovatebot/renovate/issues/18739 and https://github.com/renovatebot/renovate/issues/18740, when running against ~2000 repos, cloning each repo to then find out that there's no
renovate.json
or similar can be costly.If we instead - through some configuration - could decide to only check via an API call, rather than checking out the full repo, we could save significant bandwidth and time.
If you have any ideas on how this should be implemented, please tell us here.
RENOVATE_PRE_CHECK_DEFAULT_BRANCH
renovate.json
(or other filenames)Is this a feature you are interested in implementing yourself?
Yes