ropensci / unconf15

rOpenSci's San Francisco hackathon/unconf 2015
http://unconf.ropensci.org
35 stars 7 forks source link

Working towards binary builds for more architectures #25

Open eddelbuettel opened 9 years ago

eddelbuettel commented 9 years ago

@gaborcsardi has done a wicked job with the CRAN mirroring at the GitHub MetaCRAN repo. There are some lofty plans somewhere to provide more binaries that just for Windoze. Debian has gitbuilder which can use a git repo as backend, and distro-specific files just end up in a branch.

Maybe we can start some experimentation towards using this for Debian and/or Ubuntu. Benefits would be

eddelbuettel commented 9 years ago

As an addendum: it is pretty easy to go from (highly regularized CRAN) sources to binaries, eg I more recently started again to make use of the auto-builder / PPA infrastructure at Canonical / Launchpad: my PPA as the debian/ directory is typically fairly static (see eg for RcppArmadillo).

gaborcsardi commented 9 years ago

So the advantage over https://launchpad.net/~marutter/+archive/ubuntu/c2d4u would be more/better testing? Don't we already have that in the official CRAN repo?

eddelbuettel commented 9 years ago

It would be a reboot / possible re-use of c2d4u -- as well as an extension as Michael never aimed for all of CRAN.

We have, thanks to you, a good handle on sources. We have code and build-depends data in other places (c2d4u, Don's debian-r). We have gitbuilder as well.

I think it may be worth a discussion how to put it all steroids,

gaborcsardi commented 9 years ago

Good points. I guess it is not too difficult to do it for the majority of the packages. Then we'll see if it is worth doing it for the ones that fail, how often they'll break, etc.

I'll be definitely happy to discuss this.

cboettig commented 9 years ago

If I understand correctly, this would work from install.packages() and thus be available to install binaries of R packages on linux without needing root; particularly convenient for people on machines where they have to otherwise build from source or ask a sysadmin to install a binary(?)

On Thu, Mar 5, 2015, 8:28 AM Dirk Eddelbuettel notifications@github.com wrote:

It would be a reboot / possible re-use of c2d4u -- as well as an extension as Michael never aimed for all of CRAN.

We have, thanks to you, are good handle on sources. We have code and build-depends data in other places (c2d4u, Don's debian-r http://debian-r.debian.net).

I think it may be worth a discussion how to put it all steroids,

— Reply to this email directly or view it on GitHub https://github.com/ropensci/unconf/issues/25#issuecomment-77396425.

gaborcsardi commented 9 years ago

@cboettig Well, that is a good question. I think proper deb/rpm/etc packages is one option, and just plain binary R packages (i.e. a tgz/zip of the installed package) is another option. Both have pros and cons IMO.

eddelbuettel commented 9 years ago

@cboettig: No, I did not have rewriting install.packages() in mind. In your and my parlance, it works at the apt-get level. Like the previous cran2deb projects from which c2d4u and debian-r derived. If there were volunteers with sufficient chops to do rpm, OS X tgz, ... then we could do those too.

Right now it is simply my quest to convince @gaborcsardi that his GitHub mirror is a match made in heaven for gitbuilder :)

gaborcsardi commented 9 years ago

@eddelbuettel What I meant, the options are

  1. A proper Debian repo from which you can install with apt-get.
  2. A proper CRAN-like repo from which you can install with install.packages (assuming install.packages still supports binary packages on Linux, I guess it does).
  3. Both. :)
eddelbuettel commented 9 years ago

@gaborcsardi Well: 1) gives you proper package management at the OS level covering the depends R does not know about 2) does not

But looks like we're having a discussion. Good :)

gaborcsardi commented 9 years ago

@eddelbuettel Sure, that's advantage of 1). But 1). needs root access and 2) does not. That's advantage 2).

You are right that 2) might lead to non-loading packages. But you can wrap install.packages, such that it checks for the proper Debian/Ubuntu packages, and tells the user to tell the admin to install them. If the sysadmin removes the needed system libraries later, the package will still not load, but this is something you can't do much about.

Also, you are right that R does not (properly) know about these deps, but we'll need to know about them to build the packages, anyway. So we need some kind of mapping from SystemDependencies to Ubuntu/etc. packages.

But I am not arguing for either 1) or 2). Or 3). :)

gaborcsardi commented 9 years ago

Right now it is simply my quest to convince @gaborcsardi that his GitHub mirror is a match made in heaven for gitbuilder :)

Hah! You can convince me about that. But IMHO it is still the case that most Linux users install from source, and binaries are much more useful for windows and OSX users.

Btw. do you have a link about what gitbuilder actually does? Do we need to put debian/ directories in the packages?

gaborcsardi commented 9 years ago

Btw. do you have a link about what gitbuilder actually does?

Never mind, found it above.

eddelbuettel commented 9 years ago

Point taken on root access, but also consider that there is so much automated and scripted use these days where we consume .deb files anyway: Travis CI, Docker, ...

gaborcsardi commented 9 years ago

@eddelbuettel Agree about Docker. (It would be great to have a poll about how many people actually use Docker.)

[Btw. how about putting polls on r-project.org? Also asking @hadley here. It would be great to know (roughly) how many people install from source on OSX/Windows/Linux, how many use Docker, etc.... source and binary installs can maybe estimated from the CRAN download logs, but that also contains automated stuff, and hard to infer actual users, etc.]

As for Travis, it'd better not to use debs, actually, because that needs root access (and sudo), which means slow check times and no caching. No question, debs are way easier, and if you need extra software, then you need sudo anyway, but almost all packages don't need extra system software.

hadley commented 9 years ago

[I doubt polls will fly on the homepage. RStudio users are 70% windows, 20% mac and 10% linux - I suspect R users in general weight a bit more heavily towards linux, but I doubt by a huge amount.]

eddelbuettel commented 9 years ago

Disagree on Travis. I now build things I need more often via my PPA and I have the Travis times to prove that it is faster to install r-cran-$foo as a deb than from source. (But that is so obvious that you may have meant something else; this is what I meant.)

eddelbuettel commented 9 years ago

@hadley Useful numbers, thanks.

But one could also argue that 100% of R developers using Travis use Ubuntu 12.04 there along with a number of .deb package, just how 100% of Rocker users use Debian testing and .deb packages in their container.

gaborcsardi commented 9 years ago

What I mean is that if we finally manage not to use sudo (and thus apt-get) on Travis, then we can use the new Docker based Travis, and Travis caching. Which will speed things up. Most probably.

eddelbuettel commented 9 years ago

(Re the off-topic Travis tangent: I see. And per @craigcitro, that is coming. But no matter what the base image, when you want to add anything it is faster to add a binary that is prebuilt.)

sjackman commented 9 years ago

How do R packages specify their non-R dependencies? I maintain Linuxbrew, the port of Homebrew to Linux. It could be useful for installing those non-R dependencies. It does not require root access.

eddelbuettel commented 9 years ago

@sjackman: Every approach to this problem I am aware of does a local mapping. What one distribution calls libpostgresql-dev is libpg-dev somewhere, pg$VERSION-dev somewhere else and so on. So limited usefulness or portability of one solution to another. Also toolchains differ etc pp so this hard to solve universally.

And if I may, the does not require root access is a bit of a straw man. If we were happy to install below $HOME I wouldn't need it either. But we aim for use in installations via the system tools -- as done in Docker, Travis, derived distributions etc pp -- and that is a tad more involved.

sjackman commented 9 years ago

Science/HPC users often don't have root access. No one at my institution does. We install in $HOME.

eddelbuettel commented 9 years ago

Nothing wrong with homebrew, and I appreciate all the work you are doing there.

But I happen to not be motivated by that use case. I am admin of my HPC systems, and have supported HPC use for long enough to know that that is not an isolated case either. Both usages exists, and we simply serve different users, or even just different machine pools of the same users. And by all means if what we do here is of use to you, do feel free to use it. As I said above, I doubt it will be all that easy to generalize. But we can talk more at the unconference.

jeroen commented 9 years ago

My experience too is that linux admins are moving to vm/container solutions to provide their users with an unrestricted yet isolated environment to do their work. Nobody likes old fashioned user-role security with bureaucratic policies on how to beg the admin to install some system software.

So with the future in mind, I also think the "no root access" is not a very important use case.

sjackman commented 9 years ago

When an R package depends on a system library, how is that library installed? Is it up to the user to use the native package manager to install those dependencies?

gaborcsardi commented 9 years ago

@sjackman Yes, except on windows, where static libs are included in the binary. At least in most cases.

eddelbuettel commented 9 years ago

@sjackman: Please define "system library". I would agree on "up to the user" but not on "native package manager" as not all OSs have one worth its salt. See "R Installation and Administration" for what R (Core) has to say about the R context.

In general, you can't assume anything which is why this hard. Two orthogonal approach:

sjackman commented 9 years ago

By "system library" I meant a non-R library that is required by a R package. By "native package manager" I meant apt/yum.

sjackman commented 9 years ago

I'm surprised that on Mac OS I've never seen an install.packages build fail due to a missing system library. Have I just been lucky?

eddelbuettel commented 9 years ago

1) "system" commonly refers to the OS, so to me a "system library" is libc. 2) The term I would use is "external library" to stress that it is not part of / comprised by the R package. 3) Just try something a tad further from the mainstream. RProtoBuf or RQuantLib are examples among those I maintain; RSymphony would be one by Kurt.

sjackman commented 9 years ago

On an APT system, the difference between a system library such as glibc and an external library such as protobuf is pretty small. I like the term external library all the same.

sjackman commented 9 years ago

Awesome. Thanks for the example.

> install.packages("RProtoBuf")
…
checking google/protobuf/stubs/common.h usability... no
checking google/protobuf/stubs/common.h presence... no
checking for google/protobuf/stubs/common.h... no
configure: error: ERROR: ProtoBuf headers required; use '-Iincludedir' in CXXFLAGS for unusual locations.
ERROR: configuration failed for package ‘RProtoBuf’
* removing ‘/usr/local/Cellar/r/3.1.3/R.framework/Versions/3.1/Resources/library/RProtoBuf’
Warning in install.packages :
  installation of package ‘RProtoBuf’ had non-zero exit status
sjackman commented 9 years ago

Every approach to this problem I am aware of does a local mapping. What one distribution calls libpostgresql-dev is libpg-dev somewhere, pg$VERSION-dev somewhere else and so on. So limited usefulness or portability of one solution to another. Also toolchains differ etc pp so this hard to solve universally.

Is there any particular solution to this problem that's popular or widely used?

R could install the necessary external library dependencies such as protobuf transparently to the user if it were integrated with a portable package manager, such as Homebrew. Note Homebrew handles Mac OS and Linux, but not (yet?) Windows.

eddelbuettel commented 9 years ago

On an APT system, the difference between a system library such as glibc and an external library such as protobuf is pretty small. I like the term external library all the same.

I disagree, and strongly for that matter:

In sum, this simply is a hard problem and I do not think there are any easy outs or answer. But I look forward to you trying to convince me otherwise in person in two days ;-)

sjackman commented 9 years ago

I've created a very thin R package over the Homebrew command line client brew. https://github.com/sjackman/homebrewr See also https://github.com/ropensci/unconf/issues/34

> brew_install("protobuf")
==> Downloading https://homebrew.bintray.com/bottles/protobuf-2.6.1.yosemite.bottle.1.tar.gz
==> Pouring protobuf-2.6.1.yosemite.bottle.1.tar.gz
==> Caveats
Editor support and examples have been installed to:
  /usr/local/Cellar/protobuf/2.6.1/share/doc/protobuf
==> Summary
🍺  /usr/local/Cellar/protobuf/2.6.1: 81 files, 7.1M

Examples

brew_install("hello")
brew_remove("hello")
brew_update()
brew_upgrade()
gaborcsardi commented 9 years ago

Following the discussion, here is a first sketch of what the mappings of system requirements could look like: https://github.com/metacran/sysreqs Please comment or fix if your use case is not covered or I did something stupid. Also, happy to give you direct write access to the repo.

If you want to take a look at all SystemRequirements fields for all versions of all CRAN packages (updated regularly), here is a quick way: http://crandb-dev.r-pkg.org:8080/-/sysreqs Output is JSON, so you might need a JSON browser extension, or parse with jsonlite, etc.

eddelbuettel commented 9 years ago

Thanks so much for this.