r-lib / pak

A fresh approach to package installation
https://pak.r-lib.org
639 stars 56 forks source link

Consider option `upgrade=NA` instead of hard true or false? #612

Open dipterix opened 2 months ago

dipterix commented 2 months ago

Currently upgrade=FALSE in pkg_install prefers binary over source, while TRUE will always install the latest one.

Packages like RcppEigen, fftw, igraph are kind of hard to compile from source, which makes most packages fail to install if set to upgrade=TRUE, especially on many administrated system, where you can't install missing compilers/user don't know how to configure.

I wonder if we can have upgrade=NA, where always upgrade the packages, but prefer binary over source?

gaborcsardi commented 2 months ago

Thanks for the suggestion!

Currently we have upgrade = FALSE, meaning "install whatever is necessary, the quickest and simplest way possible".

And we have upgrade = TRUE, meaning "install the latest available version for everything".

TBH, I am reluctant to add something like "install the latest available versions, except for packages that are hard to compile".

But I do acknowledge that we should have more control over what exactly is installed.

szhorvat commented 2 months ago

Packages like ... igraph are kind of hard to compile from source

At this point all you need for igraph is a compiler toolchain, i.e. the same as for any other R package that needs compilation. No extra libraries are necessary.

gaborcsardi commented 2 months ago

No extra libraries are necessary.

So it won't link to libxml2 etc. if they are not present?

FWIW we have binaries for most R packages for many Linux distros at https://packagemanager.posit.co/client/#/ so no compilation is needed in most cases. Both CRAN and P3M gives you binaries if you are on macOS or Windows.

Pak also installs systems packages automatically if you are allowed to do that on your system.

szhorvat commented 2 months ago

So it won't link to libxml2 etc. if they are not present?

Yes, that's right. If libxml2 is missing, it'll skip it and disable GraphML import. If GLPK is missing, it'll use the vendored version.

But this is only since 2.0.3. Version 2.0.0 regressed in terms of dependencies and made them mandatory. 2.0.3 makes them all optional again.

dipterix commented 2 months ago

@gaborcsardi Thanks for telling me that you plan not to add this feature (or at least in short terms). pak has been easing installation significantly.

It would be even greater if you could offer some advice on how I can use the existing infrastructure to implement such features.

gaborcsardi commented 2 months ago

I don't know of any good ways, apart from looking up the dependencies yourself, and then downloading and installing the packages manually, in the correct order.

What is your OS, btw?

dipterix commented 2 months ago

I'm using Mac M2.

I wonder could you offer some suggestions on how I can implement this by myself? I'm thinking of adding an extra file in inst/ folder. Instead of letting pak parse/resolve the upstream, package maintainers manually control the "hard-to-install" list. For example,

{
    "system": ["fftw3", "..."],
    "r": [
        {
            "package": "fftwtools",
            "additional_repos": "https://xxx.r-universe.dev/",
            "priority": "binary",
        }, 
        {
            "package": "shiny",
            "additional_repos": "...",
            "priority": "source",
        },
    ]
}

This file specifies the system libraries in addition to Systemrequirements. It also specifies additional repos (if not CRAN, e.g. r-universe) for packages that need to be in their latest versions. priority controls whether binary should be preferred to source.

The interface could be pakExtras::pkg_install(..., extras = c("shallow", "recursive", "none")), where shallow only reads in the extra config file in inst, recursive iterates all deps, and none ignores this extra configuration (in this case, equivalent to pak::pkg_install