renovatebot / renovate

Home of the Renovate CLI: Cross-platform Dependency Automation by Mend.io
https://mend.io/renovate
GNU Affero General Public License v3.0
17.44k stars 2.29k forks source link

Improve go private auth #7361

Closed rarkins closed 2 years ago

rarkins commented 4 years ago

What would you like Renovate to be able to do?

Support private modules and not fail go x commands.

Did you already have any implementation ideas?

It seems that ~/.netrc and ~/.gitconfig files can help: https://golang.org/doc/faq#git_https. Not sure if GOPRIVATE is necessary too?

More references:

https://medium.com/cloud-native-the-gathering/go-modules-with-private-git-repositories-dfe795068db4

https://medium.com/@ysamlan/thanks-for-the-writeup-7b46bb5c927a

https://smartystreets.com/blog/2018/09/private-dependencies-in-docker-and-go

https://github.com/renovatebot/renovate/pull/7252

Are there any workarounds or alternative ideas you've tried to avoid needing this feature?

It can work today for github.com but that's maybe it

Is this a feature you'd be interested in implementing yourself?

Maybe

deltamualpha commented 3 years ago

Just hit this myself. some thoughts:

Shegox commented 3 years ago

I run today as well into this problem and looked into potential solutions as well. So far following seems to be the case (and edge cases):

Currently it is possible to archive this, in a self-hosted setup, with manually setting GOPRIVATE=github.enterprise.com and configuring git config --global url."https://token:$TOKEN@github.enterprise.com/".insteadOf "https://github.enterprise.com/", but this causes as well potential problems with a later push to the same GitHub (e.g. if go modules and the renovated repo are in the same GitHub and using different tokens)

With all that said I assume that the most common scenario is following and potentially the one to focus a first implementation on:

For determining which credentials are needing I think we have two options:

  1. Try to figure out the dependencies before running go get and then setting them. However I think this might cause issues especially if sub-dependencies are stored on different registries.
  2. Add a hostType to the hostRules like go-git to identify a specific host relevant for go requests via git and then add authentication and set GOPRIVATE based on this similiar to the current implementation with github.com to set credentials via insteadOf similiar to how npm registries are handled and unsetting them again after the go command finished.
  3. (Maybe) If running self-hosted against e.g. a GitHub Enterprise Server automatically adding the token to the go config could be helpful and cover already most cases (assuming that people are consuming go modules from their own internal GitHub server, where their source code is as well).

I plan into looking a bit further into this and see if I can get something implemented, although I'm not too familiar with the inner workings of renovate yet. I think the best approach would be 2 with maybe 3 as a reasonable default, but would appreciate pointers and further input :)


Edit: From a first experiment I found that newer versions of git (2.32) allow the usage of environment variables for configuration, making it easy to pass specific configuration to only a single command (e.g. go get).

  • GIT_CONFIG_COUNT
  • GIT_CONFIG_KEY_<n>
  • GIT_CONFIG_VALUE_<n>

If GIT_CONFIG_COUNT is set to a positive number, all environment pairs GIT_CONFIG_KEY_<n> and GIT_CONFIG_VALUE_<n> up to that number will be added to the process’s runtime configuration. The config pairs are zero-indexed. Any missing key or value is treated as an error. An empty GIT_CONFIG_COUNT is treated the same as GIT_CONFIG_COUNT=0, namely no pairs are processed. These environment variables will override values in configuration files, but will be overridden by any explicit options passed via git -c.

This is useful for cases where you want to spawn multiple git commands with a common configuration but cannot depend on a configuration file, for example when writing scripts.

I tried that with following test code and that seems to go into the right direction.

Code diff ```diff diff --git a/lib/manager/gomod/artifacts.ts b/lib/manager/gomod/artifacts.ts index 36f05c098..bf6ba3bb7 100644 --- a/lib/manager/gomod/artifacts.ts +++ b/lib/manager/gomod/artifacts.ts @@ -8,7 +8,7 @@ import { logger } from '../../logger'; import { ExecOptions, exec } from '../../util/exec'; import { ensureCacheDir, readLocalFile, writeLocalFile } from '../../util/fs'; import { getRepoStatus } from '../../util/git'; -import { find } from '../../util/host-rules'; +import { find, findAll} from '../../util/host-rules'; import { isValid } from '../../versioning/semver'; import type { PackageDependency, @@ -17,19 +17,41 @@ import type { UpdateArtifactsResult, } from '../types'; -function getPreCommands(): string[] | null { +function getGitEnvironment(): NodeJS.ProcessEnv { + let gitEnvCounter: number = 0; + let gitEnvVariables: NodeJS.ProcessEnv = {}; + const credentials = find({ hostType: PLATFORM_TYPE_GITHUB, url: 'https://api.github.com/', }); - let preCommands = null; + if (credentials?.token) { const token = quote(credentials.token); - preCommands = [ - `git config --global url.\"https://${token}@github.com/\".insteadOf \"https://github.com/\"`, // eslint-disable-line no-useless-escape - ]; + // gitEnvCounter is zero indexed, thus we first create the variables and then increment the counter + gitEnvVariables[`GIT_CONFIG_KEY_${gitEnvCounter}`] = `url.https://${token}@github.com/.insteadOf`; + gitEnvVariables[`GIT_CONFIG_VALUE_${gitEnvCounter}`] = `https://github.com/`; + gitEnvCounter++; } - return preCommands; + + // get all credentials we have for go using git + const goGitCredentials = findAll({ + hostType: "go-git", + }) + + for (const goGitCredential of goGitCredentials) { + // Check that both a token exists and a matchHost + if (goGitCredential.token && goGitCredential.matchHost) { + const token = quote(goGitCredential.token); + // gitEnvCounter is zero indexed, thus we first create the variables and then increment the counter + gitEnvVariables[`GIT_CONFIG_KEY_${gitEnvCounter}`] = `url.https://${token}@${goGitCredential.matchHost}/.insteadOf`; + gitEnvVariables[`GIT_CONFIG_VALUE_${gitEnvCounter}`] = `https://${goGitCredential.matchHost}/`; + gitEnvCounter++; + } + }; + // set the GIT_CONFIG_COUNT to the number of KEY/Value pairs + gitEnvVariables["GIT_CONFIG_COUNT"] = gitEnvCounter.toString(); + return gitEnvVariables; } function getUpdateImportPathCmds( @@ -128,13 +150,13 @@ export async function updateArtifacts({ GONOSUMDB: process.env.GONOSUMDB, GOFLAGS: useModcacherw(config.constraints?.go) ? '-modcacherw' : null, CGO_ENABLED: getAdminConfig().binarySource === 'docker' ? '0' : null, + ... getGitEnvironment(), }, docker: { image: 'go', tagConstraint: config.constraints?.go, tagScheme: 'npm', volumes: [goPath], - preCommands: getPreCommands(), }, }; ```
renovate-release commented 3 years ago

:tada: This issue has been resolved in version 25.55.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

rarkins commented 3 years ago

When we re-raise this feature again:

rarkins commented 3 years ago

By @Shegox in the PR:

@rarkins, one thing I'm uncertain and would like your input about how to do best would be to determine which packages should be updated from the source (e.g. git) instead of the using the default go module proxy. This is archived through the comma-separated GOPRIVATE environment variable.

The idea behind the previous implementation is that registryUrls is used to determine for which urls renovate should look for authentication and use it and then in the hostRules via getRemoteUrlWithToken(gitUrl, 'git'); a lookup for "git" authentication credentials can be done. I saw the main advantage of this two part configuration that a user can say "These are my private packages use the authentication you have available for them" and renovate could then use the hostRules to determine where the source is located (e.g. git or some other version control) and setup the authentication as required. The problem being that if the go fails to find a package in the go mod proxy, it tries to fetch from source, but still tries to verify the checksum in the public checksum database (and this fails).

If the GOPRIVATE environment variable is set both the proxy is directly bypassed and no checksum verification is done. Thus the normal approach is to set GOPRIVATE=git.example.com, but most likely we don't want to pull all modules from their source. Details about that can be found here: https://www.goproxy.io/docs/GOPRIVATE-env.html and about the GOSUMDB in this blog)

My alternative idea is to only allow the authentication using hostRules of type go-git and then depending on platform setup custom hostRules for e.g. github.com automatically. Thus decoupling the special github.com logic from the update step.

rarkins commented 3 years ago

There will be some servers (e.g. private GHE) which users will want to be looked up directly. In most cases, would it be sufficient for the bot admin to simply configure them in GOPRIVATE?

The trickiest is github.com. The majority of OSS packages live on github.com, and would be best looked up through the public go proxy and checksum database. But there will also be cases - for both the hosted app as well as self-hosted - where there are some private packages on github.com which should be looked up directly. We want to avoid the situation where we look up all github.com packages directly just because one or more is private.

The goproxy website includes this relevant example:

export GOPRIVATE=git.mycompany.com,github.com/my/private

BTW I wasn't aware of this:

if the go fails to find a package in the go mod proxy, it tries to fetch from source, but still tries to verify the checksum in the public checksum database (and this fails)

Is that the behavior when GOPROXY=https://goproxy.io,direct? That sounds like bad behavior independent of any Renovate concerns because pretty much every private dependency will fail by default, or am I missing something?

One possibility as a better solution to hostRules manipulation could be that GOPRIVATE can be user/repo-configurable. We have held back from allowing completely transparent control of env so far because there are variables which attackers could use for bad purposes, but an allow-list approach per-manager could be possible. Do we have a need for GOPROXY or others to be configurable too?

Shegox commented 3 years ago

I'm no go expert either, but from my research so far it is indeed the case that you always need to configure GOPRIVATE (or GONOSUMDB, which is derived from GOPRIVATE) for private packages. And yes without configuring GOPRIVATE (or GONUSUMDB) private modules fail out of the box, requiring an developer to always configure that when using private modules.

Currently the lookup works like this:

  1. Check GOPROXY if a proxy is configured (default is GOPROXY=https://proxy.golang.org,direct) (https://golang.org/ref/mod#environment-variables)
  2. Check GONOPROXY (derived from GOPRIVATE) if it should use the proxy.
  3. Query the proxy for the package, if it exists download it from proxy.
  4. If it doesn't exist in the proxy query by default the source directly (thats the GOPROXY=...,direct) and download from there.
  5. After having the package downloaded check if GOSUMDB is configured and if package doesn't match GONOSUMDB (derived from GOPRIVATE) and query GOSUMDB to verify the downloaded package checksum. If they exist and match it keeps the file, if not it is removed and an error thrown like this one:
    $ go get github.enterprise.com/ORG/repo
    go: downloading github.enterprise.com/ORG/repo v0.0.31
    go get: github.enterprise.com/ORG/repo@v0.0.31: verifying module: github.enterprise.com/ORG/repo@v0.0.31: reading https://sum.golang.org/lookup/github.enterprise.com/ORG/repo@v0.0.31: 410 Gone

    On github.com this then would look like this, there the gosumdb can't verify the package: https://sum.golang.org/lookup/github.com/!shegox/go-private-test@v0.0.0-20210802110600-1202f336bfdd

    $ go get -v github.com/Shegox/go-private-test
    go: downloading github.com/Shegox/go-private-test v0.0.0-20210802110600-1202f336bfdd
    go get: github.com/Shegox/go-private-test@v0.0.0-20210802110600-1202f336bfdd: verifying module: 
    github.com/Shegox/go-private-test@v0.0.0-20210802110600-1202f336bfdd: reading 
    https://sum.golang.org/lookup/github.com/!shegox/go-private-test@v0.0.0-20210802110600-1202f336bfdd: 410 Gone
      server response:
      not found: github.com/Shegox/go-private-test@v0.0.0-20210802110600-1202f336bfdd: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in 
    /tmp/gopath/pkg/mod/cache/vcs/7c309d90708870251624b9b2f643f35faa33228a3d060adbc23cc0bf61925550: exit status 128:
          fatal: could not read Username for 'https://github.com': terminal prompts disabled

GOPRIVATE is just a handy shortcut for configuring both GONOPROXY and GONOSUMDB (and some other things). While technically just GONUSUMDB is enough for our use-case, you probably don't even want to query the public golang proxy for your private packages therefore GONOPROXY makes sense to configure as well (https://golang.org/ref/mod#private-modules). The argument I found is that theoretically a module can be listed in the GOSUMDB without it being in the GOPROXY for whatever reason (e.g. legal reasons). (https://golang.org/ref/mod#authenticating)

So I think the whole thing is a two step configuration. One having the credentials available to query the source (e.g. git) and the second thing of telling go for which modules to use the source via GOPRIVATE.

The source credentials can be rather broadly configured (e.g. for all of github.com). For the value of GOPRIVATE I think the best solution would be indeed by passing through it as an environment variable on a per-repository configuration. This was my thinking with the registryUrls as well, which I (miss)used for that.

I think passing through other variables like GOSUMDB, GOPROXY might be helpful although I would argue that most projects use the default ones on fetch their private components directly from the source. GOPRIVATE (and via this the set GONOSUMDB and GONOPROXY) is the most important one and covers the biggest use case. As of today it is indeed possible for the self-hosted version to set GOPRIVATE (as the other ones as well) in the bot configuration, but when running on multiple repositories there might be some special hosts you can't know in advance.

I would now propose a few implementation steps and if okay would create separate PRs for that:

  1. Switch to env authentication for git.
  2. Automatically configure git authentication for a list of hostRules matching git,github,gitlab,... for go get commands. So that go get has theoretically authentication available.
  3. Expose a way to configure GO... environment variables for the hosted setup on a per-repository basis.
rarkins commented 3 years ago

I agree with these steps:

  1. env is usually our preferred way to pass credentials to child processes anyway, so this is great. Created #11060
  2. Yes, let's populate as many authentications as we can from hostRules. Let's reuse this issue for it.
  3. It seems like this will be the first use case for user-configured env (#11061). Still needs some brainstorming on how to do it
rarguelloF commented 3 years ago

Hi, I'm hitting this issue trying to use renovate in a golang project, where I use a number of private dependencies.

I get the following error:

Command failed: go get -d ./...
go: errors parsing go.mod:
...
    fatal: could not read Username for 'https://github.com': terminal prompts disabled

In the first message on this issue says there is a workaround for github.com, could you give me more details about it if possible? @rarkins

Thanks in advance 🙏

rarkins commented 3 years ago

@rarguelloF please create a new discussion describing how you run Renovate (e.g. self hosted, which docker image, etc) in addition to the above.

Shegox commented 3 years ago

I started the implementation of this in https://github.com/renovatebot/renovate/pull/12230 and would have some questions and would like your input on that @rarkins:

Open Topics/Questions:

  • [ ] Should we add a custom hostType=go for it?
  • [ ] Should we support all the other git platforms (currently GitHub/GitLab) out of the box as well (bitbucket, gittea)? Is there a generic hosttype git I can use?
  • [ ] Should we use hostRule.matchHost (supports paths) or hostRule.resolvedHost and construct the http(s) url from there?
viceice commented 3 years ago

hostType should match manager name

rarkins commented 3 years ago

I'm actually ok with ignoring hostType and adding any host with a token just in case. Certainly, we could add from hostRules which don't have a hostType

Shegox commented 3 years ago

I would as well say that adding all host Rules with a matchHost and token can be added, it shouldn't hurt us. Is there a good method of getting all hostRules? I didn't find an export for that and ended up adding a getAll function, but not sure if that makes sense.

https://github.com/renovatebot/renovate/blob/8316eb18b832c56a6834bfbaaf50be61a6e782d4/lib/util/host-rules.ts#L145-L150

rarkins commented 3 years ago

I think getAll followed by your own filter is ok

renovate-release commented 2 years ago

:tada: This issue has been resolved in version 28.19.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: