rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

Central Packrat versioned library #338

Open shapenaji opened 8 years ago

shapenaji commented 8 years ago

Packrat has been a godsend for stability going forward. With many shiny apps built on a host of different package versions, it allows me to just forget about dangers of updating.

However, each packrat repo adds a lot of overhead (I have an app that takes up 0.8 gigs, 0.5 of that is packrat, the rest is basically cached data).

It seems like it would make more sense to have a central library that has many package versions for each package, and then have the packrat only store a list of which versions in the central library the application needs, and check out there.

Is this possible to do with current packrat?

kevinushey commented 8 years ago

Packrat indeed has a (somewhat beta) caching system. There are a few gotchas, but it mostly works quite well. To opt into using the global cache, you can ensure that you pass the use.cache project option, e.g. for new projects:

packrat::init(options = list(use.cache = TRUE))

Or, to move over for existing projects:

packrat::opts$use.cache(TRUE)

When using the Packrat cache, after a call to packrat::restore(), any successfully installed packages will be copied into the global cache, and symlinks to those package installations will be made back to the private library folder.

New Packrat projects that use the cache, and discover required packages within the cache, will then simply form symlinks to packages in that cache -- thereby avoiding the need to download + reinstall multiple copies of that package.

See ?"packrat-options" for some more details; in particular, the use.cache option.

kreflix commented 7 years ago

Hey,

unfortunately, not yet enabled on Windows. Really looking forward to it (though love this package).
I have found some partially working workarounds via https://stackoverflow.com/a/44676571, that still don't fit my needs, but might be interesting to you:

For my current workflow it would suffice to have a global (not necessarily versioned) library of my packages with the private lib symlinking to those packages I need in the respective project, so that all other features and functions (snapshot, clean, status, restore, maybe even init as well as installing and removing packages) work as if I wouldn't use symlinks in the lib.

external.packages not what I was looking for I have been playing with the external.packages-option in an existing packrat-project, but I accountered problems with packrat::status() being "Up to date" even after removing packages, that I use in the same script (via remove.packages()).

packrat::set_opts(external.packages=installed.packages()[,"Package"])

Then I tried to init with this option set right away, hoping to tell packrat, that I only want to have external packages. So I tried out packrat::init(options = list(external.packages=installed.packages()[,"Package"])) and got an error. Error in results[sapply(results, function(x) inherits(x, "try-error"))] : invalid subscript type 'list' So this won't work.

manually symlink packages works only once The other workaround is to manually symlink all packages from user's lib to an empty packrat lib. @willbowditch provides a script ratpack.R for it.

source('https://raw.githubusercontent.com/willbowditch/ratpack/master/R/ratpack.R')
   import_user_packages()

Using this script's import_user_packages works, but creates sometimes erros like

packrat::snapshot()
Snapshot written to ".../packrat/packrat.lock"
Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) &&  : 
  missing value where TRUE/FALSE needed

The downside of this workaround is, that no source-files are stored, because no init was performed. And packages will be installed in the private lib, so during development it can become a bit messy where the packages are stored.

my question / request As long as caching packages causes problems on Windows, I'd think of a global option, that sets whether I'd like to use the private lib with all it's benefits or the system's/user's central lib. When choosing the central lib, packrat should install packages there and create a symlink to the project's private lib, so packrat's great collaboration and (auto)snapshot features will still work.

Inspired by @willbowditch's ratpack.R-script I am wondering, if one could simply move (copy-paste) package folders, that were recently installed into the private lib, to the user's lib and symlink it afterwards. Unfortunately, I don't really understand how the ratpack.R-script works, otherwise I would try it out by myself.

@kevinushey Are you still working on that Windows cache issue? Is it worth my effort to learn how to move folders and how to symlink in order to write my own function or are you gonna publish a solution to it in the near future?

kevinushey commented 7 years ago

Hi @kreflix,

On Windows, Packrat tries to use junction points (through Sys.junction()) rather than symlinks, as (by default) users do not have the permissions required to create symlinks (even though they are supported on newer Windows OSes).

This error:

Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) &&  : 
  missing value where TRUE/FALSE needed

actually sounds like an R 3.4.0 buglet (https://www.r-bloggers.com/error-installing-latest-r-version-3-4-0-on-windows/). If that's the case, you might try upgrading to R 3.4.1 if possible to work around it.

I think the packrat cache should actually work fine on Windows now, so I've enabled it here:

https://github.com/rstudio/packrat/commit/d6c63cb79eced4aadbcf365083b76fa593e0a985

but please file a new issue if you bump into any problems.