rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

Error in `snapshot()` with packages installed from external source #153

Open steromano opened 10 years ago

steromano commented 10 years ago

I have been trying to use packages from my company internal CRAN-like repository in my packrat project. I can install fine, but when I do packrat::snapshot() I get the error

Error: Unable to retrieve package records for the following packages: ...

The problem is that inferPackageRecords sets source = "unknown" for these external packages. Going through the code of getPackageRecords (which is throwing the error), it seems that these packages are picked up by the call to getPackageRecordsExternalSource but then they are dropped because they have unknown source.

To expand further, the error disappears if I explicitly hack the DESCRIPTION file adding Repository:CRAN for example, or if I call getPackageRecords with fallback.ok = TRUE (although there doesn't seem to be a way to do that from snapshot). In both cases the source is (incorrectly) set to "CRAN" and no error is thrown.

kevinushey commented 10 years ago

Hi @steromano,

I think setting Repository: CRAN really is the correct solution here. Is your repository truly like a CRAN mirror (serving both source tarballs and binaries) or something else?

One of the assumptions of any CRAN-like repo is that it will populate the Repository argument; packrat uses this alongside the repositories associated with a project to figure out whether install.packages() and download.packages() would be suitable for downloading the package -- also if available.packages() will query information needed to find out if we're up-to-date with that package.

Or is this something like a 'directory full of source packages'? If that's the case, you may want to use packrat's local.repos option -- see `?"packrat-options".

I think I'm misunderstanding a bit here so please fill me in a bit more if you can.

Thanks, Kevin

steromano commented 10 years ago

Hi Kevin,

Thanks for your reply. The internal repository looks like CRAN in that it serves both source tarballs and binaries, although packages sitting there aren't currently provided with a Repository argument in their description (they can be accessed by install.packages after adding the internal repo URL to options()$repos). I agree that they definitely should, although it looks a bit awkward to set it to CRAN since that is really a different repository, but maybe I'm misinterpreting what Repository: CRAN is actually expressing.

kevinushey commented 10 years ago

As an alternative: packrat could be set to understand e.g. CRAN-like or something to that effect in the repository for DESCRIPTION fields.

Essentially, we just need to tell packrat that using install.packages() and download.packages() will work for this package (assuming the repos option has been appropriately populated).

Can you think of either an appropriate value for Repository: for this case (and perhaps other use cases), or maybe another field in the DESCRIPTION that could be set that packrat could use / understand?

steromano commented 10 years ago

Perhaps packrat could accept any value in the Repository: field, as long as the same name is found in options("repos")?

kevinushey commented 10 years ago

I like that idea -- I'll implement something to that effect.

kevinushey commented 10 years ago

One catch -- would this be problematic if the URL to the repository changed (since now we are tied directly to a URL, rather than a general 'repository'?) Or is that unlikely for such internal CRAN-like repositories?

steromano commented 10 years ago

Could you expand on why this would be problematic? I think I am missing the point.

kevinushey commented 10 years ago

My main concern: if the source for a packrat package is tied to a particular repository URL, and then that repository's URL changes, packrat will no longer know how to link a particular package record to that repository.

So if a package has

Repository: http://path/to/repo/

but later, the actual URL for the repository changes to

http://new/path/to/repo/

then packrat may no longer know how to associate package records with the new repository URL.

Alternatively, packrat could just encode package records to say that a particular package is from a 'cran-like' repository, and then rely on it being found in the repos option -- but in that case, the URL in Repository: field becomes redundant (other than signaling that it's from a CRAN-like repository)

steromano commented 10 years ago

Oh I see. What I actually had in mind was having

Repository: <repository name>

in DESCRIPTION and then <repository name> = http://path/to/repo in options("repos") - and yeah that's basically equivalent to having CRAN-like without explicit reference to a repo name. In any case, I can see the URL changing, but the "name" of the repo should definitely remain the same for the most part.

kevinushey commented 10 years ago

Ahh -- gotcha. Sorry for being dense; I understand now! I'll implement something to this effect.

kevinushey commented 10 years ago

Hi @steromano,

Can you try this out with your local CRAN-like repository now? It should function as you said:

  1. Annotate the Repository: field for packages generated with e.g. myrepo,
  2. Ensure that you have the repository set for myrepo, e.g.

    options(repos = c(getOption('repos'), c(myrepo = "<pathToMyRepo>"))),

and with that, packrat should function as expected.

steromano commented 10 years ago

I believe it should be

source = as.character(df$Repository)

in the package record (instead of source = df$Repository), aside of that is seems to work fine.

wolkym commented 8 years ago

There is a StatET project, whether for version of rj_1.1 there is no Repository: in DESCRIPTION at all. Should packrat treat missing Repository with some default?

P.S. I mention owner of the repo @wahlbrink;

kadrach commented 8 years ago

Has this ever been resolved? Running into this issue with drat repositories.

kevinushey commented 8 years ago

Does the R package you're hosting on drat have a Repository: field in the DESCRIPTION file? Was the package explicitly installed from that repository?

kadrach commented 8 years ago

These are internal packages, build with devtools, and exposed via a drat repository.

The DESCRIPTION files do not expose a Repository. I have no experience publishing packages on CRAN, is this line not provided by the repository?

The package was explicitly installed from that repository, i.e. with install.packages(..., repos = my.drat.repo).

kevinushey commented 8 years ago

Normally, packages published to CRAN-like repositories will gain a Repository: CRAN field, and this is what Packrat uses to infer that a particular package is from a CRAN-like repository.

Perhaps Packrat should just always attempt to restore packages from any active CRAN-like repositories when they don't have a Repository field set, and we can't ascertain its source through any other means.

kadrach commented 8 years ago

drat does not appear to add a Repository field. I rather like the concept of having to specify exactly which repository to use (rather than any active CRAN-like repositories). Sounds like an option in eddelbuettel/drat is the better alternative? Happy to facilitate.

jsteinhart commented 8 years ago

I am having a similar situation, but in my case I am developing a package which uses packrat and depends on INLA, a third-party package hosted from its own (CRAN-like) repository. Therefore I believe this differs from the scenario described by @steromano and @kadrach since I have no control over the DESCRIPTION file (and therefore maybe this belongs in a separate issue...)

What is the best way to handle this? Request that INLA add "Repository: INLA" to their DESCRIPTION? Or is there another way to work around this in packrat?

mmuurr commented 6 years ago

I've just been bitten by this, too ... again while using another CRAN-like repo with company-specific packages (and no control for me over the DESCRIPTION) field. Perhaps a packrat (opt-in) option to try using the active repos (in the order of options("repos")) to search for the package?

kevinushey commented 6 years ago

In v0.4.9-1, Packrat should be more permissive in terms of what packages are inferred to be from CRAN:

https://github.com/rstudio/packrat/blob/f98789177c02b3066144dfb8c39ea67d681fd73c/R/restore.R#L9-L30

Now, packages without a Repository: field, or with one not equal to source or github, will be inferred as from CRAN. Does that work in your case?

mmuurr commented 6 years ago

@kevinushey I don't think that behavior solves the problem that both @jsteinhart and I have run into.

In these cases there's a CRAN-like repo hosting some private packages, none of which use the Respository: field in their DESCRIPTION. Under normal operation, you'd install such packages by first adding this special repo like so:

> options("repos" = c(options("repos"), foo = "https://my.local/repo"))
> install.packages("some_special_package")

In this case, when installing some_special_package R can't find the package at CRAN, so it then looks look for the package at repo foo, finds it, and proceeds with the installation.

Now when using packrat, the init() step works (mostly) fine if options("repos") is set as done above prior to the initialization. Inspecting the packrat.lock file reveals that both repos have been correctly detected and used while building packrat's src directory:

PackratFormat: 1.4
PackratVersion: 0.4.9.1
RVersion: 3.4.3
Repos: CRAN=https://cloud.r-project.org,
    foo=https://my.local/repo/

But inspecting the some_special_package entry in packrat.lock reveals that packrat thinks the repo source is CRAN:

Package: some_special_package
Source: CRAN
Version: 1.2.1
Hash: e129d5d8a4833c9beb6a894c5bb4df1f

When trying a restore(), we get the following error:

Error: Unable to retrieve package records for the following packages:
- 'some_special_package'

This seems to be (as described earlier in this thread) due to packrat defaulting to the CRAN assumption when a package appears to be CRAN-like, but has no Repository: field explicitly set in its DESCRIPTION.

There seem to be two relatively easy(?) ways to handle this:

  1. When the packrat repo is init()ed in the first place, Source: CRAN should instead be Source: foo for the example some_special_package. Packrat could detect this at initialization by examining the current options("repos") in order and selecting the first repo containing the installed version of the package, replicating the behavior of install.packages.
  2. An alternative would be to simply flag the package as "CRAN-like" (rather than explicitly setting the Source repo name), then when restore()-ing try to find the first matching version in the current list of options("repos").

options("repos") is getting set correctly by Packrat after the first init(), so it accurately captures the repository 'state' of the project, so both options above seem like they'd just involve walking through the repo list (a la install.packages) rather than committing to finding the package at one or the other of the named repos in that very same list.

In any case, thanks much for Packrat in general, it's still a very useful project and your work's definitely appreciated!

001ben commented 6 years ago

I'm still running into this. Same scenario with a private repo that has no Repository: PRIVATE_REPO field in it's packages. I've found you can just get around this by manually editing a Repository: PRIVATE_REPO field into the installed packages before calling packrat::snapshot() and then it will snapshot correctly.

Is it possible to overload install.packages or add some helper functions for injecting the name of the CRAN repo that packages were installed from into their DESCRIPTION?

cderv commented 6 years ago

There is packrat::install function already. Maybe it could be used to add this mechanism ?

Also, devtools' functions have already this mechanism of adding metadata to the installed DESCRIPTION file. For exemple, devtools::install_cran will write some informations about the repository in the DESCRIPTION. It may needs some adjustments to work as intended with packrat as those metadata are useful for devtools only today I think. The mechanism of devtools could be reuse in some packrat function.

DiklaGelbard commented 5 years ago

Hi, I am struggling with the same problem, I am working with several packages from a bitbucket repository First I ran the command: options(repos=c(tg='https://tanaylab.bitbucket.io/repo',BiocManager::repositories()) I re-installed the packages with install.packages() Only then, I ran packrat::init() Upon completion of the initialization I get the following warning messages:

1: In FUN(X[[i]], ...) :
  Package 'metacell 0.3.32' was installed from sources; Packrat will assume this package is available from a CRAN-like repository during future restores
2: In FUN(X[[i]], ...) :
  Package 'tgconfig 0.0.21' was installed from sources; Packrat will assume this package is available from a CRAN-like repository during future restores

When I tried to publish the app it failed and I got this error:

Error: Unhandled Exception: Child Task 580834913 failed: Error building image: Error fetching tgconfig (0.0.21) source. <CRANPackageSource repo='http://cran.rstudio.org'> unable to satisfy package: tgconfig (0.0.21)

Can you help me figure out how to solve this problem?

kevinushey commented 5 years ago

If the error is with app publication, presumedly https://github.com/rstudio/rsconnect/issues would be a better place to file an issue.

I suspect the problem here is that your active repositories are not being communicated during deployment -- not sure if that's a bug in rsconnect or something that could be handled on your side yet though.

DiklaGelbard commented 5 years ago

Ok, I will try to ask there Thanks

lorenzwalthert commented 5 years ago

I had the same problem. Here's how I solved it (thanks to this thread):

  1. Removed packrat from my project and uninstalled the package from the internal CRAN-like repo.
  2. set options(repos = c(IRAN = path_to_iran, CRAN = "https://cran.rstudio.com/")) where IRAN is the name of the internal CRAN-like repo and path_to_iran the path to it, i.e something starting with file:/// as described here.
  3. Set the description field Repository: for the internal package to IRAN.
  4. Build and deployed the internal package with miniCRAN to the CRAN-like repository.
  5. re-installed the internal package.
  6. re-initialized packrat in my project.

And hooray it worked.

strazto commented 5 years ago

I had the same problem. Here's how I solved it (thanks to this thread):

  1. Removed packrat from my project and uninstalled the package from the internal CRAN-like repo.
  2. set options(repos = c(IRAN = path_to_iran, CRAN = "https://cran.rstudio.com/")) where IRAN is the name of the internal CRAN-like repo and path_to_iran the path to it, i.e something starting with file:/// as described here.
  3. Set the description field Repository: for the internal package to IRAN.
  4. Build and deployed the internal package with miniCRAN to the CRAN-like repository.
  5. re-installed the internal package.
  6. re-initialized packrat in my project.

And hooray it worked.

Thanks for the workaround, @lorenzwalthert , it was enough for me to get things initialised, since my private repos are maintained on Enterprise, and packrat just wouldn't cooperate with fetching their sources.

My problem is, the goal of using packrat was always portability, as I wish to be able to deploy to the HPC array at work with relative ease, which actually works quite well using a bundled project, but when trying to install any package, it's now unable to resolve a hard-coded path to my computer's "localCRAN".

The requirement of local CRAN-like repos for installing packages available locally seems to be really counterintuitive, particularly when I would otherwise just use devtools::install_git with a PAT and repo url, but that's its own issue. (I'll raise it now in case I'm missing something)