Open dgkf opened 3 years ago
My main worry with making the GitLab remote more complex is our team doesn't use GitLab, so it is possible this will break in the future without us realizing it.
We would definitely need some tests to avoid this.
Thanks @jimhester - I'm happy to add in tests as much as possible. If you think the implementation looks sound, then I can get to work on tests and updating docs. I was just hesitant to invest more time fleshing out the peripheral bits until getting some impressions on the approach.
@jimhester - this PR is ready whenever you have an opportunity to take a look. The only CI errors are ones that also exist on master
. Overall, the design feels a bit clunky, but I'm struggling to come up with anything better.
To trace through the changes, it is easiest to start with install_gitlab
's call out to gitlab_to_git_remote
, and then look at the uses of $url
in install_git.R
as the url may contain a url-embedded username and token.
Just to highlight a critical design choice:
Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:12345@gitlab.com/...
) when storing remote_metadata
or printing to console?
remotes::update_packages
to update a package which requires authenticationFor now, I chose to scrub the username and password from the url before this is added to a DESCRIPTION file to prioritize safety of access tokens over the update experience.
The self-hosted GitLab issues are currently a big pain point at my org, so some help in moving this forward would be greatly appreciated.
Thank you for this PR. This is a must when working on private GitLab instances. I approve its improvements.
I tried it in a CI instance with the following classical use cases I guess. The PR solves the problems encountered with current version of {remotes}. This can be accepted as is.
CI_JOB_TOKEN
set up with {git2r}install_local()
tempclone <- tempfile(pattern = "conjdown")
dir.create(tempclone)
git2r::clone(url = "https://git.lab.sspcloud.fr/propre-conj/conjdown", local_path = tempclone, credentials = git2r::cred_user_pass(username = "gitlab-ci-token", password = Sys.getenv("CI_JOB_TOKEN")) )
remotes::install_local(tempclone)
## install_git()
- current {remotes} FAIL
- PR OK
```r
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_git("https://myprivategitlab.com/user/repos")
remotes::install_gitlab(host = "https://myprivategitlab.com",
repo = "user/repos",
auth_token = Sys.getenv("CI_JOB_TOKEN"))
message:
auth_token does not have scopes 'read-repository' and 'api' for host
'https://myprivategitlab.com" required to install using
gitlab_remote.
Attempting git_remote
git::
DESCRIPTION file
Imports:
repos
Remotes:
git::https://myprivategitlab.com/user/repos"
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_deps(dependencies = TRUE)
GITLAB_PAT
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_gitlab(host = "https://myprivategitlab.com",
repo = "user/repos")
message:
Using GitLab PAT from envvar GITLAB_PAT
auth_token does not have scopes 'read-repository' and 'api' for host
'https://myprivategitlab.com" required to install using
gitlab_remote.
Attempting git_remote
gitlab::
This is a try. I know this is not the aim of this PR, but that could be a future enhancement, maybe.
DESCRIPTION file
Imports:
repos
Remotes:
gitlab::https://myprivategitlab.com/user/repos"
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_deps(dependencies = TRUE)
Error
Error: Unknown remote type: gitlab
Invalid git repo specification: 'https://myprivategitlab.com/user/repos'
Execution halted
Do you think that it could be a good idea to allow gitlab_pat()
to also look for CI_JOB_TOKEN
environment variable if GITLAB_PAT
is empty ? This may solve a lot pain using CI.
Thanks for considering this PR, @jimhester.
Just wanted to highlight this bit in the PR thread for your consideration. I tried my best to dig into how remotes
/renv
use the Remotes*
fields in the DESCRIPTION
file, but wasn't totally sure what the preferred solution would be for access tokens in urls and want to make sure it was brought to your attention in case there are any security concerns with how it's handled.
Design Feedback Request: is it preferred to keep the full url including username and password (
https://dgkf:12345@gitlab.com/...
) when storingremote_metadata
or printing to console?
- the good: this would allow
remotes::update_packages
to update a package which requires authentication- the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file
Currently user-specific url components are stripped to minimize any printing/saving of tokens.
I guess we should strip them, though it would then break updating packages later.
However if you still set the GITLAB_PAT
when you run update_packages()
would the update work?
As described in #604, the current
gitlab_remote
makes use of API endpoints that are not available to tokens generated for use within gitlab CI (stored in the$CI_JOB_TOKEN
env var), throwing errors when these tokens are used.This PR adds code to first ping the API at a generic endpoint (querying for /version). If that request fails and
isTRUE(getOption("remotes.gitlab_git_fallback", TRUE))
, agit_remote
is returned.If
git2r
is available, a credentials object is created from theauth_token
. Otherwise, the token is embedded in the url in the form ofhttp://gitlab-ci-token:$TOKEN@example.com/namespace/project.git
.This allows
install_gitlab
to be used within CI jobs on non-public deployments of GitLab without the creation and embedding of personal tokens. Pipeline engineers need only to runexport GITLAB_PAT=$CI_JOB_TOKEN
prior to installing remotes.Changelog
install_gitlab
will defer to usinginstall_git
when authentication doesn't provide adequate API access to download a source archivegit2r::cred_user_pass
ifgit2r
is availablegitlab-ci-token
. When providing a PAT, GitLab ignores the username unless one is using aCI_JOB_TOKEN
token within a CI job, in which case it must begitlab-ci-token
. Because of this, it covers both scenarios to passgitlab-ci-token
in both cases. Unfortunately I wasn't able to find any documentation to reference for this behavior, it was only narrowed down through testing.http://<username>:<password>@host.com/repo.git
)DESCRIPTION
(Design Feedback Request: is it preferred to keep the full url for updates, or to exclude the password so that it isn't leaked through things likerenv
?)git()
updated to take an optionaldisplay_args
command to provide output using censored git url as to not display access tokens in console output. This is used inremote_download.xgit_remote
to display git commands without printing passwords to console.parse_git_url()
updated to also extract a username and password, though it might be worth taking on a dependency to handle url parsing since this is regex is getting pretty involvedgit_anon_url()
introduced to strip out username and password components from a urlgit_censored_url()
introduced to replace the password component with asterisksgit_remote
when the GitLab host API requests failgit_fallback = getOption("remotes.gitlab_git_fallback", TRUE)
parameter toinstall_gitlab
This is an initial pass just to experiment with implementation. Please let me know if this looks like a reasonable approach, and then I can polish this PR with