rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

Packrat and Authenticated Private Repositories #499

Open jeffkeller87 opened 6 years ago

jeffkeller87 commented 6 years ago

I have a number of packages that I'd like to host on an authenticated private repository (specifically using Artifactory). I would also like to be able to use this repository with packrat.

Authentication is done over an HTTPS connection in the form https://user:token@example.com/path/to/repo/. When used with packrat, this URL, which contains private data in user:token, is stored in packrat/packrat.lock, which I'd like to be able to source control for portability/reproducibility purposes.

I am curious what the packrat developers think of this situation. Can this be supported with a new packrat feature? Or is this a foundational issue with the way base R uses only a URL string to link to repositories?

Note: Issues arise in utils::available.packages (when packages are cached locally) if user is an email address (another @ character), which others have experienced, and @dehowell addressed in the .Rprofile patch below. Though once fixed, the above issues regarding packrat.lock still apply.


# Set artifactory-cran as the repository for R packages. This is a virtual repository in Artifactory
# that combines:
# artifactory-cran-local - our local, non-public packages
# cran-r-project - packages in the public CRAN
local({
  user = utils::URLencode("<YOUR_EMAIL>", reserved=TRUE)
  # API key generated by Artifactory
  token = "<YOUR_ARTIFACTORY_API_KEY>"
  url = "artifactory.example.com/artifactory/artifactory-cran/"
  options(repos = c("artifactory-cran" = paste0( "https://", user, ":", token, "@", url)))
})

# The build in URLencode function handles a URL containing an email username
# poorly, which prevents R from downloading the CRAN package index from Artifactory.
# The following block replaces the built-in available.packages function with a wrapped
# version that uses a modified version of URLencode.
local({
  library(utils)
  original <- utils::available.packages

  wrapped <- function(...) {
    # Monkey-patch URLencode for this execution
    URLencode.original <- utils::URLencode
    URLencode.wrapped <- function(URL, reserved=FALSE, repeated=TRUE) {
      URLencode.original(URL, reserved, repeated)
    }
    unlockBinding("URLencode", asNamespace("utils"))
    assign("URLencode", URLencode.wrapped, envir=asNamespace("utils"))

    # Call original available.packages
    result <- original(...)

    # Restore the original URLencode function
    assign("URLencode", URLencode, envir=asNamespace("utils"))
    lockBinding("URLencode", asNamespace("utils"))

    result
  }

  unlockBinding("available.packages", as.environment("package:utils"))
  assign("available.packages", wrapped, envir=as.environment("package:utils"))
  lockBinding("available.packages", as.environment("package:utils"))

  unlockBinding("available.packages", asNamespace("utils"))
  assign("available.packages", wrapped, envir=asNamespace("utils"))
  lockBinding("available.packages", asNamespace("utils"))  
})

# Set the download method for compatibility with packrat.
# Artifactory 404s with a JSON payload on /PACKAGES.rds and /PACKAGES.gz.
# Packrat requires curl for downloading the package index, but curl doesn't throw
# an error on a 404, which means that R tries to parse the JSON using the package index
# parser. -f will cause curl to throw an error on the 404 instead.
options(download.file.method = 'curl', download.file.extra = '-f')```
kevinushey commented 6 years ago

This isn't a use case that was imagined when Packrat was originally developed. Normally, R packages downloaded from a CRAN-like repository have a field in their DESCRIPTION e.g.

Repository: CRAN

and Packrat uses that as a signal to attempt to download the package with install.packages() -- and so use whatever the current value of getOption("repos") is in downloading the package. In other words, inevitably for this to work with the current Packrat architecture we'd need some mechanism for getting that URL into the repos R option while still shielding it from Packrat's regular serialization methods.

Ultimately fixing this means having some way of:

  1. Safely storing the user + token information in a place accessible to Packrat;

  2. Teaching Packrat how to override or augment the repos option with that information.

I'm not sure what the best way forward is (and unfortunately I likely won't have time to look at this for a while)

jeffkeller87 commented 6 years ago

@kevinushey are you saying that if packages in the private repository have the Repository field specified correctly in their DESCRIPTION, that packrat would rely on matching that with an entry in getOption("repos") and not put the full URL in packrat.lock? Or would there still be an issue in packrat.lock?

If so, then I suppose that would mean just not using Artifactory as a joint virtual repository.

Repository: artifactory-cran

and

options(repos = c("CRAN" = "https://cran.rstudio.com/", "artifactory-cran" = "https://user:token@example.com/path/to/repo/"))
kevinushey commented 6 years ago

Packrat uses that Repository: field as a signal that it should attempt to install the package from CRAN. Note that Packrat doesn't attempt to restore from the specific repository stated in the DESCRIPTION file; rather, all repositories are tried (as per the default behavior of install.packages()).

I think the ultimate problem here is just figuring out a way of communicating the secret token into the active set of repositories in a way that doesn't get it inserted into the lockfile (if I understand correctly)

jeffkeller87 commented 6 years ago

I think you are right about the ultimate problem. I was hoping packrat would look at the Repository: field and attempt to restore from that specific repository.

I can't think of a solution that doesn't break at least some of packrat's existing portability. 😢

Looking forward to your ideas when you have time to look deeper.

ras44 commented 5 years ago

@jeffkeller87 I'm not sure if this would apply to your use-case, but have you tried using the packrat::install_local function to manage the installation of your private packages? In short, you would clone your repos to something like ~/local_repo/myPrivatePackage, and then run packrat::install_local('myPrivatePackage') to install the version you have in local_repo to the R project's packrat environment.

jeffkeller87 commented 5 years ago

That's an interesting idea... assuming I mount a drive or copy the repo locally. I couldn't get this to work with a local copy of a repo I know works with install.packages. So I can't test whether restore will work the way I expect. Seems to be the same as this issue. I can't tell what the repos argument is expecting from the documentation, but I get this error for all path combinations:

~/local_repo/
~/local_repo/src/
~/local_repo/src/contrib/
packrat::install_local("my_package", repos = repo_path)
Error in findLocalRepoForPkg(pkg, repos, fatal = fatal) : 
  No package 'my_package' found in local repositories specified
ras44 commented 5 years ago

Could you try: in bash:

cp my_package ~/local_repo

so that local_repo looks like:

local_repo/
  my_package/
    R/
    ...

Then in R:

packrat::set_opts(local.repos = c("~/local_repo"))
packrat::install_local('my_package')
jeffkeller87 commented 5 years ago

Is this supposed to work when my_package has dependencies? This only installs my_package, even if the dependencies are included in ~/local_repo:

packrat::install_local("my_package", repos = "~/local_repo", dependencies = TRUE)
ras44 commented 5 years ago

It should install any CRAN depenencies, but I'm not sure about dependencies on other packages in the local_repo. @kevinushey might have better insight into this.

If it's not installing private deps, you could workaround by installing the dependencies manually in the same manner. Not ideal, but it should get them into packrat's lib for your project.

jeffkeller87 commented 5 years ago

packrat::install_local uses CMD INSTALL under the hood. Last I checked it did not support cross-repository dependencies. The reason I'd like to use packrat is to avoid installing dependencies manually in the first place.