r-lib / remotes

Install R packages from GitHub, GitLab, Bitbucket, git, svn repositories, URLs
https://remotes.r-lib.org/
Other
336 stars 153 forks source link

Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604) #608

Open dgkf opened 3 years ago

dgkf commented 3 years ago

As described in #604, the current gitlab_remote makes use of API endpoints that are not available to tokens generated for use within gitlab CI (stored in the $CI_JOB_TOKEN env var), throwing errors when these tokens are used.

This PR adds code to first ping the API at a generic endpoint (querying for /version). If that request fails and isTRUE(getOption("remotes.gitlab_git_fallback", TRUE)), a git_remote is returned.

If git2r is available, a credentials object is created from the auth_token. Otherwise, the token is embedded in the url in the form of http://gitlab-ci-token:$TOKEN@example.com/namespace/project.git.

This allows install_gitlab to be used within CI jobs on non-public deployments of GitLab without the creation and embedding of personal tokens. Pipeline engineers need only to run export GITLAB_PAT=$CI_JOB_TOKEN prior to installing remotes.

Changelog


This is an initial pass just to experiment with implementation. Please let me know if this looks like a reasonable approach, and then I can polish this PR with

jimhester commented 3 years ago

My main worry with making the GitLab remote more complex is our team doesn't use GitLab, so it is possible this will break in the future without us realizing it.

We would definitely need some tests to avoid this.

dgkf commented 3 years ago

Thanks @jimhester - I'm happy to add in tests as much as possible. If you think the implementation looks sound, then I can get to work on tests and updating docs. I was just hesitant to invest more time fleshing out the peripheral bits until getting some impressions on the approach.

dgkf commented 3 years ago

@jimhester - this PR is ready whenever you have an opportunity to take a look. The only CI errors are ones that also exist on master. Overall, the design feels a bit clunky, but I'm struggling to come up with anything better.

To trace through the changes, it is easiest to start with install_gitlab's call out to gitlab_to_git_remote, and then look at the uses of $url in install_git.R as the url may contain a url-embedded username and token.

Just to highlight a critical design choice:

Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:12345@gitlab.com/...) when storing remote_metadata or printing to console?

For now, I chose to scrub the username and password from the url before this is added to a DESCRIPTION file to prioritize safety of access tokens over the update experience.


The self-hosted GitLab issues are currently a big pain point at my org, so some help in moving this forward would be greatly appreciated.

statnmap commented 3 years ago

Thank you for this PR. This is a must when working on private GitLab instances. I approve its improvements.

I tried it in a CI instance with the following classical use cases I guess. The PR solves the problems encountered with current version of {remotes}. This can be accepted as is.

Use CI_JOB_TOKEN set up with {git2r}

Clone and install_local()

git2r::clone(url = "https://git.lab.sspcloud.fr/propre-conj/conjdown", local_path = tempclone, credentials = git2r::cred_user_pass(username = "gitlab-ci-token", password = Sys.getenv("CI_JOB_TOKEN")) )

remotes::install_local(tempclone)


## install_git()
- current {remotes} FAIL
- PR OK
```r
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
  remotes::install_git("https://myprivategitlab.com/user/repos")

install_gitlab()

install from another package DESCRIPTION file with git2r creds and git::

DESCRIPTION file

Imports: 
    repos
Remotes:
    git::https://myprivategitlab.com/user/repos"
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_deps(dependencies = TRUE)

Set GITLAB_PAT

install_gitlab()

install from another package DESCRIPTION file with GITLAB_PAT and gitlab::

This is a try. I know this is not the aim of this PR, but that could be a future enhancement, maybe.

DESCRIPTION file

Imports: 
    repos
Remotes:
    gitlab::https://myprivategitlab.com/user/repos"
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_deps(dependencies = TRUE)

Error

Error: Unknown remote type: gitlab
  Invalid git repo specification: 'https://myprivategitlab.com/user/repos'
Execution halted
statnmap commented 3 years ago

Do you think that it could be a good idea to allow gitlab_pat() to also look for CI_JOB_TOKEN environment variable if GITLAB_PAT is empty ? This may solve a lot pain using CI.

dgkf commented 3 years ago

Thanks for considering this PR, @jimhester.

Just wanted to highlight this bit in the PR thread for your consideration. I tried my best to dig into how remotes/renv use the Remotes* fields in the DESCRIPTION file, but wasn't totally sure what the preferred solution would be for access tokens in urls and want to make sure it was brought to your attention in case there are any security concerns with how it's handled.

Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:12345@gitlab.com/...) when storing remote_metadata or printing to console?

  • the good: this would allow remotes::update_packages to update a package which requires authentication
  • the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file

Currently user-specific url components are stripped to minimize any printing/saving of tokens.

jimhester commented 3 years ago

I guess we should strip them, though it would then break updating packages later.

However if you still set the GITLAB_PAT when you run update_packages() would the update work?