renovatebot / renovate

Home of the Renovate CLI: Cross-platform Dependency Automation by Mend.io
https://mend.io/renovate
GNU Affero General Public License v3.0
17.25k stars 2.26k forks source link

Perform an API call to check if Renovate is enabled, rather than cloning the repo #18811

Closed jamietanna closed 5 months ago

jamietanna commented 1 year ago

What would you like Renovate to be able to do?

Related to https://github.com/renovatebot/renovate/issues/18739 and https://github.com/renovatebot/renovate/issues/18740, when running against ~2000 repos, cloning each repo to then find out that there's no renovate.json or similar can be costly.

If we instead - through some configuration - could decide to only check via an API call, rather than checking out the full repo, we could save significant bandwidth and time.

If you have any ideas on how this should be implemented, please tell us here.

Is this a feature you are interested in implementing yourself?

Yes

viceice commented 1 year ago

if you want to skip repos via API, add a renovate.json with enabled=false

https://docs.renovatebot.com/self-hosted-configuration/#optimizefordisabled

jamietanna commented 1 year ago

Interesting - would this still require each repo to be "onboarded" with a renovate.json, just with enabled: false? Is there a way to get the Renovate Runner to default this behaviour, to avoid onboarding repos?

rarkins commented 1 year ago

This behaviour could probably be tweaked to behave differently when onboarding=false and requireConfig=optional like you have it. As long as you are ok with using renovate.json as the file name

jamietanna commented 1 year ago

Nice, so when I run:

env RENOVATE_REQUIRE_CONFIG=required RENOVATE_ONBOARDING=false renovate --token $GITHUB_TOKEN --autodiscover-filter 'jamietanna/*' --autodiscover --optimize-for-disabled

I get:

DEBUG: GET https://api.github.com/repos/jamietanna/jamietanna/contents/renovate.json = (code=ERR_NON_2XX_3XX_RESPONSE, statusCode=404 retryCount=0, duration=285) (repository=jamietanna/jamietanna)
DEBUG: Resetting npmrc (repository=jamietanna/jamietanna)
DEBUG: detectSemanticCommits() (repository=jamietanna/jamietanna)
DEBUG: Initializing git repository into /tmp/renovate-cache-1667898864/repos/github/jamietanna/jamietanna (repository=jamietanna/jamietanna)
DEBUG: Performing blobless clone (repository=jamietanna/jamietanna)
DEBUG: git clone completed (repository=jamietanna/jamietanna)
       "durationMs": 1369
DEBUG: latest repository commit (repository=jamietanna/jamietanna)
       "latestCommit": {
         "hash": "ffb5b962a9ce6518737fa3a3774ce08679ae7f3e",
         "date": "2022-11-08T09:12:33+00:00",
         "message": "Automagic update",
         "refs": "HEAD -> main, origin/main, origin/HEAD",
         "body": "",
         "author_name": "README-bot",
         "author_email": "actions@users.noreply.github.com"
       }
Full log ``` DEBUG: Using RE2 as regex engine DEBUG: Parsing configs DEBUG: Checking for config file in /Users/jamietanna/workspaces/_external/tanna.dev/renovate-graph/config.js DEBUG: No config file found on disk - skipping DEBUG: File config "config": {} DEBUG: CLI config "config": { "baseDir": "/tmp/renovate-cache-1667898864", "optimizeForDisabled": true, "token": "***********", "autodiscover": true, "autodiscoverFilter": ["jamietanna/jam*"] } DEBUG: Env config "config": {"hostRules": [], "onboarding": false, "requireConfig": "required"} DEBUG: Combined config "config": { "hostRules": [], "onboarding": false, "requireConfig": "required", "baseDir": "/tmp/renovate-cache-1667898864", "optimizeForDisabled": true, "token": "***********", "autodiscover": true, "autodiscoverFilter": ["jamietanna/jam*"] } DEBUG: Found valid git version: 2.37.1 DEBUG: Using default github endpoint: https://api.github.com/ DEBUG: No throttle "host": "api.github.com" DEBUG: No concurrency limits "host": "api.github.com" DEBUG: GET https://api.github.com/user/emails = (code=ERR_NON_2XX_3XX_RESPONSE, statusCode=404 retryCount=0, duration=243) DEBUG: Cannot read user/emails endpoint on GitHub to retrieve gitAuthor DEBUG: Platform config "platformConfig": { "hostType": "github", "endpoint": "https://api.github.com/", "isGHApp": false, "isGhe": false, "userDetails": {"username": "jamietanna", "name": "Jamie Tanna"}, "userEmail": null }, "renovateUsername": "jamietanna" DEBUG: Adding token authentication for api.github.com to hostRules DEBUG: Using configured baseDir: /tmp/renovate-cache-1667898864 DEBUG: Using cacheDir: /tmp/renovate-cache-1667898864/cache DEBUG: Using containerbaseDir: /tmp/renovate-cache-1667898864/cache/containerbase DEBUG: Initializing Renovate internal cache into /tmp/renovate-cache-1667898864/cache/renovate/renovate-cache-v1 DEBUG: Commits limit = null DEBUG: Setting global hostRules DEBUG: Adding token authentication for api.github.com to hostRules DEBUG: validatePresets() DEBUG: Autodiscovering GitHub repositories INFO: Autodiscovered repositories "length": 1, "repositories": ["jamietanna/jamietanna"] DEBUG: Reinitializing hostRules for repo DEBUG: Clearing hostRules DEBUG: Adding token authentication for api.github.com to hostRules INFO: Repository started (repository=jamietanna/jamietanna) "renovateVersion": "34.2.3" DEBUG: Using localDir: /tmp/renovate-cache-1667898864/repos/github/jamietanna/jamietanna (repository=jamietanna/jamietanna) DEBUG: PackageFiles.clear() - Package files deleted (repository=jamietanna/jamietanna) DEBUG: initRepo("jamietanna/jamietanna") (repository=jamietanna/jamietanna) DEBUG: No throttle (repository=jamietanna/jamietanna) "host": "api.github.com" DEBUG: No concurrency limits (repository=jamietanna/jamietanna) "host": "api.github.com" DEBUG: jamietanna/jamietanna default branch = main (repository=jamietanna/jamietanna) DEBUG: Using personal access token for git init (repository=jamietanna/jamietanna) DEBUG: GET https://api.github.com/repos/jamietanna/jamietanna/contents/renovate.json = (code=ERR_NON_2XX_3XX_RESPONSE, statusCode=404 retryCount=0, duration=285) (repository=jamietanna/jamietanna) DEBUG: Resetting npmrc (repository=jamietanna/jamietanna) DEBUG: detectSemanticCommits() (repository=jamietanna/jamietanna) DEBUG: Initializing git repository into /tmp/renovate-cache-1667898864/repos/github/jamietanna/jamietanna (repository=jamietanna/jamietanna) DEBUG: Performing blobless clone (repository=jamietanna/jamietanna) DEBUG: git clone completed (repository=jamietanna/jamietanna) "durationMs": 1369 DEBUG: latest repository commit (repository=jamietanna/jamietanna) "latestCommit": { "hash": "ffb5b962a9ce6518737fa3a3774ce08679ae7f3e", "date": "2022-11-08T09:12:33+00:00", "message": "Automagic update", "refs": "HEAD -> main, origin/main, origin/HEAD", "body": "", "author_name": "README-bot", "author_email": "actions@users.noreply.github.com" } DEBUG: getCommitMessages (repository=jamietanna/jamietanna) DEBUG: Semantic commits detection: unknown (repository=jamietanna/jamietanna) DEBUG: No semantic commits detected (repository=jamietanna/jamietanna) DEBUG: checkOnboarding() (repository=jamietanna/jamietanna) DEBUG: isOnboarded() (repository=jamietanna/jamietanna) DEBUG: findFile(renovate.json) (repository=jamietanna/jamietanna) DEBUG: findFile(renovate.json5) (repository=jamietanna/jamietanna) DEBUG: findFile(.github/renovate.json) (repository=jamietanna/jamietanna) DEBUG: findFile(.github/renovate.json5) (repository=jamietanna/jamietanna) DEBUG: findFile(.gitlab/renovate.json) (repository=jamietanna/jamietanna) DEBUG: findFile(.gitlab/renovate.json5) (repository=jamietanna/jamietanna) DEBUG: findFile(.renovaterc) (repository=jamietanna/jamietanna) DEBUG: findFile(.renovaterc.json) (repository=jamietanna/jamietanna) DEBUG: config file not found (repository=jamietanna/jamietanna) INFO: Repository is disabled - skipping (repository=jamietanna/jamietanna) DEBUG: Repository result: disabled-no-config, status: disabled, enabled: false, onboarded: undefined (repository=jamietanna/jamietanna) DEBUG: Repository timing splits (milliseconds) (repository=jamietanna/jamietanna) "splits": {}, "total": 2772 DEBUG: Package cache statistics (repository=jamietanna/jamietanna) "get": {"count": 0}, "set": {"count": 0} DEBUG: dns cache (repository=jamietanna/jamietanna) "hosts": [] INFO: Repository finished (repository=jamietanna/jamietanna) "cloned": true, "durationMs": 2772 DEBUG: Renovate exiting ```

Maybe I'm misunderstanding, but I'd expect that in this case, we wouldn't clone after seeing there's no renovate.json, or are you indicating that there are some things we should tweak in the codebase for this usecase?

viceice commented 1 year ago

renovate clones, because it needs to check some more possible config files https://github.com/renovatebot/renovate/blob/d869c946d164965393f1259db5171e2a0303fb27/lib/config/app-strings.ts#L1-L11

so you should enable repo cache and save <cacheDir>/renovate/repository/ bewteen runs. then renovate should know there is no config and don't clone until there is a new commit

jamietanna commented 1 year ago

Gotcha - do we think it'd be worth an enhancement to do all of those checks via API calls, instead of cloning to check?

So when running:

env RENOVATE_REPOSITORY_CACHE=enabled LOG_LEVEL=debug RENOVATE_REQUIRE_CONFIG=required RENOVATE_ONBOARDING=false renovate --token $GITHUB_TOKEN --autodiscover-filter 'jamietanna/*' --autodiscover --optimize-for-disabled --base-dir /tmp/renovate-base-new  --cache-dir /tmp/renovate-cache-new --dry-run 

The result of baseDir and cacheDir is that they're empty, which I guess is expected as there's no dependency extraction performed, because they're skipped as they're disabled, but if we made it RENOVATE_REQUIRE_CONFIG=optional, that'd work?

jamietanna commented 1 year ago

Looking at a significantly larger project, when running:

env RENOVATE_REPOSITORY_CACHE=enabled LOG_LEVEL=debug RENOVATE_REQUIRE_CONFIG=optional RENOVATE_ONBOARDING=false renovate --token $GITHUB_TOKEN --autodiscover-filter '...' --autodiscover --optimize-for-disabled --base-dir /tmp/renovate-base-new  --cache-dir /tmp/renovate-cache-new --dry-run

Then the result of /tmp/renovate-cache-new/renovate/repository is only 28K, which is quite nice compared to the size of the repo! I'll try re-running it a few times today and see how it responds with the repo having commits pushed to it

rarkins commented 1 year ago

I was meaning we could tweak optimizeForDisabled

rarkins commented 1 year ago

I'm not sure repoCache helps here because we'd still need to clone any time there's a new commit to make sure no config files. I think the optimization can work as long as you're happy to use only renovate.json

viceice commented 1 year ago

ok. so we can assume disabled if onboarding is false, required config is true and we get a 404 for the default config file (renovate.json, optionally changed by onboarding file name for self-hosted)

jamietanna commented 1 year ago

Nice, yeah that'd work ๐Ÿ‘

In our current setup, it wouldn't save us any time, as we're having to pre-filter the organisations' repos and split them into groups of 100, because we're hitting https://github.com/renovatebot/github-action/issues/648 with our GitHub App authentication

ChristianCiach commented 1 year ago

I am hitting the same issue. We have hundreds of internal projects and cloning them all takes a very long time.

The relevant part of my config looks like this:

autodiscover: true
optimizeForDisabled: true
requireConfig: 'required'
onboarding: false

The documentation of optimizeForDisabled says:

If the file exists and the config is disabled, Renovate will skip the repo without cloning it. Otherwise, it will continue as normal.

I think this can be improved by also skipping a repo when requireConfig: 'required' and the renovate.json is missing (or whatever filename is configured by onboardingConfigFileName), exactly as suggested by @viceice .

I am mainly a Java developer, so I unfortunately cannot provide a PR for this. But anyway, would a PR that implements this proposal even be considered for merging? Or are there any potential issues with this that I am overlooking?

ChristianCiach commented 1 year ago

The code in question seems to be https://github.com/renovatebot/renovate/blob/34d26700cf32ff7a32cdf93179773b10db75ec0a/lib/workers/repository/init/apis.ts#L28

The heavy lifting seems to be already done. I guess all that needs to be done is to slightly extend the method to include an additional check, like so:

async function validateOptimizeForDisabled(
  config: RenovateConfig
): Promise<void> {
  if (config.optimizeForDisabled) {
    const renovateConfig = await getJsonFile(defaultConfigFile(config));
    if (renovateConfig == null && !config.onboarding && config.requireConfig == 'required') {
      throw new Error(REPOSITORY_NO_CONFIG);
    }
    if (renovateConfig?.enabled === false) {
      throw new Error(REPOSITORY_DISABLED_BY_CONFIG);
    }
  }
}

This seems simple enough, even for me. Should I try to submit a PR for that?

viceice commented 1 year ago

sure, but probably needs at least a new test for code coverage.

ChristianCiach commented 1 year ago

I don't know how to write unit tests in TypeScript, but I could probably use an existing one as a template.

If anyone else wants to do it, feel free not to wait for me!

rarkins commented 1 year ago

So we are adding a special case, where all these are true?

In this case the repo must use renovate.json and not one of the other file names. If the renovate.json is missing, or includes enabled=false, then the repo will be skipped.

rarkins commented 1 year ago

This should be rushed for v35 or wait for v36, as it's technically a breaking change

jamietanna commented 1 year ago

@Twith2Sugars is this one you may be able to help contribute?

ChristianCiach commented 1 year ago

@rarkins Could you please explain why this is a breaking change in your eyes? I cannot imagine an example where this is anything else than a performance optimization.

Also, when optimizeForDisabled=true, the code currently already only looks for a single file: Either config.onboardingConfigFilename (if configured) or configFileNames[0] (which is surely renovate.json). So nothing would change in that regard.

rarkins commented 1 year ago

It's a breaking change because if someone has a repo with a config file like renovate.json5 today, it will work. But after this change, it will be skipped

ChristianCiach commented 1 year ago

@rarkins Thanks, that's right! I didn't see that. In other words, currently optimizeForDisabled is indeed always only a performance optimization but will never trigger a change in observable behavior. The implementation proposed here would change that.

In this case, I am totally fine with delaying this until one of the next major releases. Not only because it is breaking, but also to give us more time to think about it and maybe come up with a better solution.

ChristianCiach commented 1 year ago

Maybe this can made less breaking by only skipping a repo if all of the criteria mentioned in https://github.com/renovatebot/renovate/issues/18811#issuecomment-1449402620 are true AND also only if onboardingConfigFilename has been explicitly configured?

I guess that the combination of onboarding: false and onboardingConfigFilename: "filename" is extremely rare. Why would someone go out of their way to configure an onboardingConfigFilename if onboarding is disabled? Of course, cases like these could happen if onboarding was originally true but has been set to false later, so it's still not bullet-proof.

Anyway: I think it is reasonable to only perform this optimization if onboardingConfigFilename has been explicitly configured (even though we are now using this setting for something that it wasn't originally made for, and the name of the setting is slightly confusing in this context).

rarkins commented 1 year ago

Either way it's breaking, so I wouldn't go too far out of your way to accommodate it. The combination of configuration options is already specific enough that I think it's ok to say "if you use optimizeForDisabled=true and onboarding=false, you must use renovate.json in repos you wish to enable".

ChristianCiach commented 1 year ago

Agreed, with the exception that I wouldn't hardcode renovate.json as the filename. optimizeForDisabled is already considering onboardingConfigFilename for the filename (if configured) and I see no reason to change that.

https://github.com/renovatebot/renovate/blob/34d26700cf32ff7a32cdf93179773b10db75ec0a/lib/workers/repository/init/apis.ts#L16

jakauppila commented 1 year ago

I tried enabling the same configuration as specified in https://github.com/renovatebot/renovate/issues/18811#issuecomment-1448936211 today to speed up our executions as we have thousands of repositories taking hours to run, so this would be an incredibly welcomed (and timely) change.

We are just now rolling out Renovate internally so we only have dozens of repositories with it enabled at this time.

viceice commented 1 year ago

@jamietanna Do you like to provide a PR for v36? Should be done in next weeks of cause ๐Ÿ™ƒ

jamietanna commented 1 year ago

I'd be up for this, but may not get to it for a week or so so if anyone else on this issue is interested in claiming it, go ahead!

viceice commented 1 year ago

OK, will add it to the planning. We can drop it if it's not ready ๐Ÿ˜‰

ChristianCiach commented 1 year ago

Recent versions of Renovate include an onboarding cache:

I may be missing something, but can this cache be extended to severely improve the situation for non-active repos when onboarding: false? If we know that a repo does not contain a renovate config and the revision of the default branch has not changed since the last run of renovate, then there is no need to clone again, right?

This wouldn't not even be breaking, in contrast to the changes suggested above.

rarkins commented 1 year ago

It could in theory yes

RahulGautamSingh commented 1 year ago

Recent versions of Renovate include an onboarding cache:

I may be missing something, but can this cache be extended to severely improve the situation for non-active repos when onboarding: false? If we know that a repo does not contain a renovate config and the revision of the default branch has not changed since the last run of renovate, then there is no need to clone again, right?

This wouldn't not even be breaking, in contrast to the changes suggested above.

The cache is for onboarding branch. In your case there is no onboarding branch so this wouldn't work as it is.

ChristianCiach commented 1 year ago

I've implemented a workaround that works extremely well for us. Since it may be useful for others, I've decided to share it here.

My workaround makes use of the fact that the configuration can be a script file (/usr/src/app/config.js) that can export an async function as described in https://github.com/renovatebot/renovate/pull/13075 . My idea was to disable "autodiscovery" and to instead dynamically construct the repositories array with projects that happen to contain a renovate.json in their default branch.

TL;DR, this is my config.js:

const gitlabApiEndpoint = 'https://our.company.gitlab/api/v4'

const {logger: logger} = require('renovate/dist/logger');
const gitlab = require('renovate/dist/util/http/gitlab');
gitlab.setBaseUrl(gitlabApiEndpoint);

const {default: PQueue} = require('p-queue');

const httpOpts = {
  paginate: true,
  token: process.env.RENOVATE_TOKEN
};
const http = new gitlab.GitlabHttp('gitlab', httpOpts);

async function toNullIfRenovateDisabled(project) {
  const pId = project.id;
  const ref = project.default_branch;
  try {
    await http.head(`projects/${pId}/repository/files/renovate.json?ref=${ref}`);
    return project;
  } catch (err) {
    if (err.statusCode === 404 || err.statusCode === 403) {
      return null;
    }
    throw new Error("Couldn't retrieve renovate.json", { cause: err });
  }
}

async function retrieveProjects() {
  try {
    const response = await http.getJson(`projects?per_page=100&archived=false&membership=true&min_access_level=30`, httpOpts);
    return response.body.filter(p => !p.mirror && !p.marked_for_deletion_on);
  } catch (err) {
    throw new Error("Couldn't retrieve projects", { cause: err });
  }
}

function toRenovateRepositories(projects) {
  return projects.map(p => p.path_with_namespace);
}

async function getRepositories() {
  const projects = await retrieveProjects();
  const enabledProjects = [];
  const queue = new PQueue({concurrency: 10});
  for (const p of projects) {
    queue.add(() => toNullIfRenovateDisabled(p).then((p) => {
      if (p) {
        enabledProjects.push(p);
      }
    }));
  }
  await queue.onIdle();
  return toRenovateRepositories(enabledProjects);
}

module.exports = async function() {
  const enabledRepos = await getRepositories();
  logger.info({ repositories: enabledRepos }, 'Detected repositories:');
  return {
    platform: 'gitlab',
    endpoint: gitlabApiEndpoint,
    autodiscover: false,
    requireConfig: 'optional',  // performance opt: Skip checking for closed PRs
    onboarding: false,
    repositories: enabledRepos,
    repositoryCache: 'enabled'  // experimental flag
  };
}

Disclaimer and caveats:

Anyway, this cuts the time of a Renovate execution by more than 70% for us, saves tons of traffic and surely reduces the load on our Gitlab instance. I am very happy with this result.

illay1994 commented 1 year ago

Thanks @ChristianCiach Based on your example I did an example for Azure DevOps

const azure = require('azure-devops-node-api');
const {default: PQueue} = require('p-queue');
const fs = require('fs');

const authHandler = azure.getHandlerFromToken(process.env.RENOVATE_TOKEN, true);
const azureApi = new azure.WebApi(process.env.ENDPOINT, authHandler);

async function toNullIfRenovateDisabled(project) {

  const azureApiGit = await azureApi.getGitApi();

  const pId = project.id;
  const projectName = project.project.name;
  try {
    const result = await azureApiGit.getItem(pId, "renovate.json", projectName);

    if (result == null)
    {
      return null;
    }

    return project;
  } catch (err) {
    console.info({project: project, err: err}, 'toNullIfRenovateDisabled: Functions error');
    throw new Error("Couldn't retrieve renovate.json", { cause: err });
  }
}

async function getRepositories() {

  const azureApiCore = await azureApi.getCoreApi(); 
  const projects = await azureApiCore.getProjects();
  const enabledProjects = [];
  const queue = new PQueue({concurrency: 10});
  const azureApiGit = await azureApi.getGitApi();

  for (const project of projects) {
    const repos = await azureApiGit.getRepositories(project.name);

    for (const repo of repos)
    {
      queue.add(() => toNullIfRenovateDisabled(repo).then((p) => {
        if (p) {
          enabledProjects.push(p.project.name+ "/"+p.name);
        }
      }));
    }
  }
  await queue.onIdle();

 return enabledProjects;
}
rarkins commented 5 months ago

Hi, this issue got quite long with 33 comments, making it hard to know what the "real" feature to implement is. If anyone still wants this, please suggest a feature in Discussions with a concise summary of desired behavior, and once it's settled then we'll create a new Feature Request issue.