rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

packrat needs repository record in DESCRIPTION if a local CRAN-like repository is used #528

Open braidm opened 5 years ago

braidm commented 5 years ago

During packrat:initial(), packrat 0.5.0 tries to infer where to find a package's source, and does so in a large if-then-else construction. If you have a package that has been successfully installed from a CRAN-like repository, but does not actually contain a "repository:" record, then internally the source gets marked as "unknown" and packrat::initial() fails when it does a snapshot. This dependency on a repository record in DESCRIPTION doesn't seem to be documented. At the very least it would be nice to make the error message more helpful by indicating this dependency. But an even better solution might be to have packrat check each configured repositories to see if any hold the source (our "repos" option was correctly pointing to our CRAN-like repository so it would have found it had it checked).

kenahoo commented 5 years ago

I hit this problem too, and last week I asked a Stack Overflow question about it:

https://stackoverflow.com/q/56277832/169947

I came here to suggest the same fix (look in options('repos') to see whether any repository contains the package in question) and found your open ticket. Would definitely like to see this implemented if possible!

kenahoo commented 5 years ago

I've been playing around with this, and the workarounds I thought would work apparently won't work:

The only workaround I can think of now would be monkey-patching packrat at runtime to do something different than it does right now.

My usual approach would be to make a local fork of packrat, rename it to something like myPackrat, patch it with the fixes I'd need, and use that instead of packrat. But this would be kind of tough because there are hundreds of places in the source where packrat refers to itself by name. Deciding what to do with each one of them would take a while.

braidm commented 5 years ago

For me, because I had developed the package I was using, it was easy to add "Repository: Local" to the DESCRIPTION file of the package. But for 3rd party packages installed into your local CRAN-like repository, you'd presumably have to un-tar, edit, then re-tar ........

kenahoo commented 5 years ago

Yeah, and that's only feasible if you control the repository, and if you don't care about the MD5 signatures changing when you change the DESCRIPTION file.

I think what's going to work for me is this patching:

    PR <- getNamespace('packrat')
    unlockBinding('inferPackageRecord', PR)
    inferPackageRecordOrig <- PR$inferPackageRecord
    PR$inferPackageRecord <- function(df) {
        record <- inferPackageRecordOrig(df)
        if (record$source == 'unknown') {
            record$source <- 'CustomCRANLikeRepository'
            class(record) <- c("packageRecord", "CustomCRANLikeRepository")
        }
        record
    }
    lockBinding('inferPackageRecord', PR)
    packrat::snapshot(ignore.stale=TRUE)

I haven't put that through all the lifecycle paces yet, but it does get the snapshot to succeed.

kenahoo commented 5 years ago

Could also change that if clause to:

if (record$source == 'unknown' && record$name %in% rownames(available.packages())) {

or a caching version thereof.

kevinushey commented 5 years ago

It seems reasonable to allow packages from "unknown" sources to be restored from CRAN, but I'm not sure if this is a behavior that should be on by default for Packrat, or opt-in with an option of some kind.

kenahoo commented 5 years ago

Just to be clear - we're not talking about restoring unknown packages from CRAN itself, because CRAN packages seem to usually be correctly identified. We're talking about restoring unknown packages from a third-party repository. That's what my workaround above does.

kevinushey commented 5 years ago

Right, by CRAN I mean "CRAN-like repository". (Packrat doesn't treat the 'official' CRAN repositories any differently from third-party repositories)

kenahoo commented 5 years ago

Any thoughts about how best to move this forward? Personally I'd say the proposed behavior should be on by default - I think it would only modify behavior in cases that are currently fatal errors.

kenahoo commented 5 years ago

Hi @kevinushey , what are the prospects for getting this change adopted? It's become an obstacle for several of our projects as we deploy. Please let me know if I can help.

kevinushey commented 5 years ago

@kenahoo would you be willing to give renv a try? It's basically a ground-up rewrite of Packrat that should handle this in the way you expect:

https://github.com/rstudio/renv/blob/ec4d83f45f401681297bc79c5b4f07ee80ef1527/R/snapshot.R#L439-L461

kenahoo commented 5 years ago

I can give it a shot.

The change in this ticket should only be a couple lines big, would you be willing to take a PR for it, or is packrat essentially dead?

kevinushey commented 5 years ago

I think we can still take a PR. We'll continue to maintain Packrat, but active development will be moving to renv.

kenahoo commented 5 years ago

@kevinushey I checked out renv, I'm afraid it's not a viable alternative right now because it can't handle our internal CRAN-like repository: it doesn't respect options(download.file.extra) (just like in https://github.com/r-lib/remotes/issues/290) so it doesn't pass login credentials to the repository, and I get a 401 Unauthorized response. Not sure how viable it would be if that were solved, I didn't look beyond that issue yet. Also, since it's not on CRAN yet we probably couldn't integrate it in a stable way to our production environments very easily. It does look promising, though, and easier for people to understand since it's so similar to Python environments.

In general though, I do feel like I'm chasing the same issues around so many different projects right now, between devtools and remotes and packrat and renv and pak etc., and things getting refactored & rewritten into new projects so often, with backslides in functionality, but issues in the older packages don't get attention because the new packages are shinier.... I've opened lots of tickets & pull requests, but they just don't get the attention they'd need from committers to solve the problems. For that first link, I've been trying for 3.5 years to get the change adopted, porting the changes across time (rebasing) & across repositories every so often, and there doesn't even seem to be any objection to merging, it just seems like a lack of attention.

Sorry for the (certainly misplaced in this thread) vent. :frowning: I do highly appreciate all the packages and I understand there must be a lot of priorities competing for attention in this space. If there's something I can do better to collaborate, please let me know (better than my current strategy of popping up every so often with annoying "hey, did you look at this PR yet?" messages). My goals are presumably the same as everyone else's, to make managing packages & dependencies more manageable in deployments, so that the developers & ops people at our company feel like R is as supportable as Python or Java.

kevinushey commented 5 years ago

@kevinushey I checked out renv, I'm afraid it's not a viable alternative right now because it can't handle our internal CRAN-like repository: it doesn't respect options(download.file.extra) (just like in r-lib/remotes#290) so it doesn't pass login credentials to the repository, and I get a 401 Unauthorized response.

This can definitely be fixed; right now we take somewhat heavy-handed control over the download options. We try to force the use of curl for downloads, since that allows for authenticated downloads regardless of the version of R, and allows for more user-side configuration with proxies, certificate stores, and so on. I'll see what we're missing with handling of download.file.extra.

I can definitely understand your frustration, though. Thanks for bearing with us; we make these changes because we believe they're the right ones but the truth is that we do miss cases like yours. On the Packrat side, we felt the local.repos tooling in Packrat was a bit misguided (+ difficult to maintain), which is why it was left to languish. I'm fairly more confident we've gotten the right solution in renv with "local" packages, though.

kenahoo commented 5 years ago

Thanks @kevinushey - I can open a ticket in renv for supporting download.file.extra. Supporting curl for downloads works well for us, for exactly the reasons you point out.

kenahoo commented 5 years ago

Created https://github.com/rstudio/renv/issues/128 .

chrisknoll commented 3 years ago

Btw, since we're referencing renv vs. packrat, it seems RConnect handles bundling of assets via packrat, while renv is what the future is. This makes it difficult to adopt the 'promoted practices' of renv while having to deal with legacy frameworks like packrat. I'm running into this very issue now about handling local/non-cran packages in our own projects, and havign difficulty pushing to RConnect specifically due to the handling of local repositories. RENV sounds like it has solutions, but RConnect is using packrat. Very frustrating. I hope RConnect can make the transition so we can have one solution for packaged dependency management.

kevinushey commented 3 years ago

@chrisknoll we definitely understand your frustration. Our goal is for RStudio Connect to eventually transition towards using renv, so that this impedance mismatch can finally go away.