r-lib / pkgdepends

R Package Dependency Resolution
https://r-lib.github.io/pkgdepends/
Other
94 stars 30 forks source link

Private git repositories (Azure Repos): cannot authenticate #319

Open lgaborini opened 1 year ago

lgaborini commented 1 year ago

I can't handle private git repositories (in my case Azure Repos) that require authentication in the URL.

The Remote entry is something like: git::https://dev.azure.com/org/project_git/package.

Starting e.g. from a new_pkg_installation_proposal("git::https://dev.azure.com/org/project_git/package"), {pkgdepends} retrieves the credentials using {gitcreds} from the system credential store, then builds the URL and should grab stuff from git.

Instead, I get an error from curl:

s_remote <- "git::https://dev.azure.com/org/project/_git/package"
p <- pkgdepends::new_pkg_installation_proposal(s_remote, config = list(library = tempfile()))
#> ℹ Creating library directory: 'C:\Users\LORENZ~1\AppData\Local\Temp\RtmpM93Fz5\file4140718a10d'
p$resolve()
p$get_resolution()$error
#> [[1]]
#> <async_rejected/rlib_error_3_0/rlib_error/error>
#> Error in `value[[3L]](cond)`:
#> ! pkgdepends resolution error for
#> git::https://dev.azure.com/org/project/_git/package.
#> Caused by error: 
#> ! Failed to download 'DESCRIPTION' from git repo at
#> <https://dev.azure.com/org/project/_git/package>.
#> Caused by error in `(function (e) …`:
#> ! URL using bad/illegal format or missing URL

Created on 2023-05-24 with reprex v2.0.2

By digging into the internals, I see that {pkgdepends} retrieves the username (in my org they have the form of user@domain), the password, and builds the resulting URL: https://user@domain:password@dev.azure.com/org/project/_git/package

I believe that the @ in the username must be URL-encoded (%40) if using {pkgdepends}: I'm not sure if it's an Azure-specific pattern.
I confirm it works by debugging into pkgdepends:::async_git_list_refs_v2.

I get the same behavior from the shell with the system git (notice that user@ gets dropped):

git clone "https://user@domain:password@dev.azure.com/org/project/_git/package"
Cloning into 'rITAtools'...
fatal: unable to access 'https://domain:password@dev.azure.com/org/project/_git/package': URL using bad/illegal format or missing URL

Cloning succeeds with git clone "https://user%40domain:password@dev.azure.com/org/project/_git/package".

It would be cool to be able to use the system git if available, as {remotes} already does.
In that way, I can setup authentication in the .gitconfig with a PAT + HTTPS extra header (as recommended), and that would avoid reimplementing a whole git client...

Thanks!

gaborcsardi commented 1 year ago

First I would check if the PAT works instead of the username and password. E.g. put the PAT in the git credential store. (Or put the PAT in the GITHUB_PAT_DEV_AZURE_COM environment variable if you don't want to change the credential store.)

We can certainly do a better job at authorization and send an Authorization header instead of embedding the password in the URL.

Having a git client in pak certainly has its pros and cons, but this issue is easy to fix with the built-in client as well.

gaborcsardi commented 1 year ago

For the record, we should use CURLOPT_HTTPAUTH with CURLAUTH_ANY, and CURLOPT_USERNAME and CURLOPT_PASSWORD.

Darxor commented 1 year ago

I can confirm this issue with a self-hosted version of GitLab. I was also using PAT instead of password.

But for me issue was two-fold:

  1. lack of url escaping in credentials, because my username/pat pair contained reserved URL symbols (i.e. "@")
  2. gitlab throwing error 422 Unprocessable Entity even after urlencoding, if repo's URL lacks ".git" in the end

So to fully fix this issue git_auth_url() has to be modified as follows:

git_auth_url <- function(remote) {
  url <- remote$url
  auth <- tryCatch(gitcreds_get(url), error = function(err) NULL)
  if (is.null(auth)) {
    url
  } else {
    paste0(
      remote$protocol,
      "://",
      utils::URLencode(auth$username, reserved = TRUE, repeated = TRUE), # changed
      ":",
      utils::URLencode(auth$password, reserved = TRUE, repeated = TRUE),  # changed
      "@",
      sub(paste0("^", remote$protocol, "://"), "", remote$url)
    )
  }
}

And repo URL modified from https://example.com/path/to/repo -> https://example.com/path/to/repo.git

Modification to git_auth_url() should be an easy change (I can add a PR if you want), not so sure about mandatory .git in the URL. Perhaps a mention in docs may suffice.