rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

Allow using downloading privately-hosted packages with vendored `renv` #677

Closed toph-allen closed 1 year ago

toph-allen commented 2 years ago

Intent

Approach

At a high level, reworked the overall workflow for Git providers in the getSourceForPkgRecord function. This involves change to downloader and provider-specific functions.

File-by-file notes

Because this involves a lot of code moving around, I took file-by-file notes on changes. These are ordered logically, from highest to lowest-level changes, with links to the files in GitHub's "Files changed" view. Hopefully this makes it easier to navigate.

Automated Tests

Multiple automated tests have been added for lower-level functions

QA Notes

It's difficult to publicly and reproducibly verify end-to-end downloads of private packages. We'll perform verification offline.

Checklist

Note: Some of this is still TODO

toph-allen commented 1 year ago

Tests started failing with this commit. I have no idea why the tests that are failing started to fail here, because the failures seem unrelated.

toph-allen commented 1 year ago

@aronatkins Exhaustive description of download functions in use. Let me know if this checks out.

Packrat download function contracts

Could you explain the contract of each of the download functions? In particular: Where do they write, who handles deletion of files on error, and how does the caller gauge success (return value versus error, etc).

In general, I've moved away from having success encoded in return values, and towards leaning entirely on R errors (i.e. stop()). Occasionally, callers will intercept this to add additional context.

getSourceForPkgRecord

githubDownload(url = archiveUrl, destfile = srczip), gitlabDownload(url = archiveUrl, destfile = srczip), bitbucketDownload(url = archiveUrl, destfile = srczip)

I'm going sometimes refer to all of these collectively as providerDownload.

renvDownload(url, destfile, type)

githubDownloadHttr(url, destfile, ...), gitlabDownloadHttr(url, destfile, ...), bitbucketDownloadHttr(url, destfile, ...)

Referring to these singularly/collectively as providerDownloadHttr.

downloadWithRetries(url, ..., maxTries = 5L)

I didn't touch much of this, except to add an error at the end of the download function wasn't successful.

download(url, destfile, method = inferAppropriateDownloadMethod(url)

Down into old packrat code I didn't touch.

toph-allen commented 1 year ago

@aronatkins I found a bug in the logic in the githubDownload etc. functions.

Specifically, the conditional logic is specified such that if GitHub is authenticated but neither renvDownload nor githubDownloadHttr are available, downloadWithRetries is not called as a fallback.

I will submit a fix shortly.

toph-allen commented 1 year ago

@aronatkins Changes are in these commits.

I explored using the download() { downloadImpl() } pattern, but found that it would still require a fair amount of boilerplate in the top-level download function, because there is variation on two axes: download method and provider. The downloadRenv() function would have required different logic from githubDownloadHttr(), for example, and passing that data around would have made each function call different in the top-level dispatch functions.

What I wound up doing was creating an error handling closure that returns a function based on type ("github", "gitlab", or "bitbucket") to generate error text. It makes the download function logic much clearer than before, with richer error messages that give explicit information about the expected behavior.

I also expanded tests for each provider to cover the expected scenarios, and added tests for the new error message function.

Gonna kick off a validation run to make sure I haven't introduced new bugs.

toph-allen commented 1 year ago

@aronatkins, @ChaitaC and I just validated this PR together on 2022-10-17.