pola-rs / r-polars

Bring polars to R
https://pola-rs.github.io/r-polars/
Other
415 stars 36 forks source link

Mention r-universe binary for Ubuntu 22.04 ? #99

Closed eddelbuettel closed 1 year ago

eddelbuettel commented 1 year ago

The README correctly points out that r-universe has builds but then moves on to Linux and the same old song of having to build from source. It so happens that r-universe also builds binaries (in the R CMD INSTALL --build sense, without system metadata).

I have found that this combines well with [r2u](https://eddelbuettel.github.io/r2u/) in its jammy flavor. The newest version of littler has a new helper installRub.r (for "install R-Universe Binary") that can be used. Detail below.

```sh edd@rob:~$ docker run --rm -ti eddelbuettel/r2u:jammy root@01675e0439b8:/# time installRub.r -u rpolars rpolars Ign https://r2u.stat.illinois.edu/ubuntu jammy InRelease Get:1 https://r2u.stat.illinois.edu/ubuntu jammy Release [5713 B] Get:2 https://r2u.stat.illinois.edu/ubuntu jammy Release.gpg [793 B] Hit http://archive.ubuntu.com/ubuntu jammy InRelease Get:3 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2256 kB] Get:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB] Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB] Get:6 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [7067 kB] Hit https://ppa.launchpadcontent.net/edd/misc/ubuntu jammy InRelease Get:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [108 kB] Get:8 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [23.2 kB] Get:9 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1251 kB] Get:10 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [907 kB] Get:11 https://ppa.launchpadcontent.net/marutter/rrutter4.0/ubuntu jammy InRelease [17.5 kB] Get:12 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1148 kB] Get:13 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [908 kB] Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [962 kB] Get:15 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [914 kB] Get:16 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [49.0 kB] Get:17 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [23.3 kB] Fetched 15.9 MB in 0s (0 B/s) Available system packages as root... Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) trying URL 'https://rpolars.r-universe.dev/bin/linux/jammy/4.2/src/contrib/rpolars_0.4.6.tar.gz' Content type 'application/x-gzip' length 20547552 bytes (19.6 MB) ================================================== downloaded 19.6 MB * installing *binary* package ‘rpolars’ ... * DONE (rpolars) The downloaded source packages are in ‘/tmp/downloaded_packages’ real 0m24.446s user 0m14.553s sys 0m6.931s root@01675e0439b8:/# R R version 4.2.3 (2023-03-15) -- "Shortstop Beagle" Copyright (C) 2023 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(rpolars) > ```

Now rpolars is so incredible skinny that nothing else gets pulled in but I e.g. demonstrated this in this tweet for a pre-release of a CRAN release which shows how the r-universe binary comes from there and r2u chips in all other (binary, .dep) dependencies (and would fully resolve those if needed).

And this of course also works 'directly' in one manually checks the repo path (and implicitly ensure Ubuntu and right release) -- but note that this is again inside a r2y container with bspm.

```r > install.packages("rpolars", repos = c("https://rpolars.r-universe.dev/bin/linux/jammy/4.2", "https://cloud.r-project.org")) Install system packages as root... Reading package lists... Done Building dependency tree... Done Reading state information... Done Reading package lists... Done Building dependency tree... Done Reading state information... Done Fetched 0 B in 0s (0 B/s) Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) trying URL 'https://rpolars.r-universe.dev/bin/linux/jammy/4.2/src/contrib/rpolars_0.4.6.tar.gz' Content type 'application/x-gzip' length 20547552 bytes (19.6 MB) ================================================== downloaded 19.6 MB * installing *binary* package ‘rpolars’ ... * DONE (rpolars) The downloaded source packages are in ‘/tmp/RtmpLHqMm9/downloaded_packages’ > ```

I think this may make a nice addition to the README. And eg

rp <- c("https://rpolars.r-universe.dev/bin/linux/jammy/4.2", "https://cloud.r-project.org")
install.packages(c("rpolars", "arrow"), repos = rp)     

installs both rpolars and arrow from binaries (one from your universe, the other as the current r2u binary along with its ten or so dependencies.

PS: Also no mas if you think this is too far "out there". Happy to make it a blog post so feel free to close if it doesn't suit.

sorhawell commented 1 year ago

Many thanks @eddelbuettel for suggesting this. I'm looking forward to try this out. Compiling rpolars can take 15-50min which is not very attractive. Any easy and robust way to do binary install on any linux machine is very welcome. This not my strong side, and we happily promote/adopt any good advice :)

Currently, I know of that our github release workflow compile the linux rpolars binary against glibc 2.34. The binary is not compatible with any linux having an older glibc as reported in #86 . It appears we should choose to compile against fairly old glibc, that most distributions out there can support.

eddelbuettel commented 1 year ago

Yes -- I cannot speak for Jeroen here but he clearly labels his (sole) Linux binary as Ubuntu 22.04 (ie current LTS). So there is no implied "wheels" or "build once, run anywhere" notion.

I have been working on / with r2u for about a year now, and it brings actual .deb binaries with full dependencies at the system level to the users -- no other 'just in R or around R' approach can do that. (Win some, loose some: Flipside is that it also only works on the Ubuntu 20.04 or 22.04 systems). But this is eg amazeballs for things like CI or quick test. Because 24 seconds clearly beats 15 - 50 mins. Myself and others (incl our setup at work) use this to speed up CI on Ubuntu by orders of magnitude too.

And so somewhat recently I realized that combining the full backing of r2u (for all dependencies, reliably) with "on-point" access to early builds / test builds / non-yet-on-CRAN packages etc makes a potent combo. And then Grant mentioned his draft vignette (very nice !!) to me and I realized, wait, I can maybe just run rpolars to try it and there it was.

But if you think this is too far out for the README no worries. Otherwise I can try to draft a short paragraph to mention this with a one-liner. We can maybe point to a larger (to be written...) document. I may do a quick blog post on the weekend.

sorhawell commented 1 year ago

@eddelbuettel I tried to install rpolars and arrow in gitpod + wsl-jammy + virtualbox-jammy

gitpod worked, seemed very fast and smooth!

however for wsl-jammy and virtualbox-jammy on a windows desktop

I get Could not connect to r2u.stat.illinois.edu:443 (192.17.190.167). - connect (111: Connection refused)

I use the add_cranapt_jammy.sh here named instr2u2.sh

Am I missing something obvious? :)

full wsl log

Windows Subsystem for Linux is now available in the Microsoft Store!
You can upgrade by running 'wsl.exe --update' or by visiting https://aka.ms/wslstorepage
Installing WSL from the Microsoft Store will give you the latest WSL updates, faster.
For more information please visit https://aka.ms/wslstoreinfo

Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

This message is shown once a day. To disable it please create the
/home/ubuntu/.hushlogin file.
ubuntu@DESKTOP-NIQG5GO:~$ su
Password:
su: Authentication failure
ubuntu@DESKTOP-NIQG5GO:~$ su
Password:
su: Authentication failure
ubuntu@DESKTOP-NIQG5GO:~$ su
Password:
su: Authentication failure
ubuntu@DESKTOP-NIQG5GO:~$ sudo
usage: sudo -h | -K | -k | -V
usage: sudo -v [-ABknS] [-g group] [-h host] [-p prompt] [-u user]
usage: sudo -l [-ABknS] [-g group] [-h host] [-p prompt] [-U user] [-u user] [command]
usage: sudo [-ABbEHknPS] [-r role] [-t type] [-C num] [-D directory] [-g group] [-h host] [-p prompt] [-R directory]
            [-T timeout] [-u user] [VAR=value] [-i|-s] [<command>]
usage: sudo -e [-ABknS] [-r role] [-t type] [-C num] [-D directory] [-g group] [-h host] [-p prompt] [-R directory] [-T
            timeout] [-u user] file ...
ubuntu@DESKTOP-NIQG5GO:~$ sudo su
[sudo] password for ubuntu:
root@DESKTOP-NIQG5GO:/home/ubuntu# ls
R  instr2u.sh  instr2u2.sh
root@DESKTOP-NIQG5GO:/home/ubuntu# . instr2u2.sh
74 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ca-certificates is already the newest version (20211016ubuntu0.22.04.1).
ca-certificates set to manually installed.
gnupg is already the newest version (2.2.27-3ubuntu2.1).
gnupg set to manually installed.
gpg-agent is already the newest version (2.2.27-3ubuntu2.1).
gpg-agent set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 74 not upgraded.
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
Executing: /tmp/apt-key-gpghome.UEnaZizANW/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys A1489FE2AB99A21A
gpg: key A1489FE2AB99A21A: public key "Dirk Eddelbuettel <edd@debian.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
Executing: /tmp/apt-key-gpghome.05qGtrkbhF/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys 67C2D66C4B1D4339 51716619E084DAB9
gpg: key 51716619E084DAB9: public key "Michael Rutter <marutter@gmail.com>" imported
gpg: key 67C2D66C4B1D4339: public key "Launchpad PPA for Dirk Eddelbuettel" imported
gpg: Total number processed: 2
gpg:               imported: 2
75 packages can be upgraded. Run 'apt list --upgradable' to see them.
W: https://cloud.r-project.org/bin/linux/ubuntu/focal-cran40/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
W: Failed to fetch https://r2u.stat.illinois.edu/ubuntu/dists/focal/InRelease  Could not connect to r2u.stat.illinois.edu:443 (192.17.190.167). - connect (111: Connection refused)
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
r-base-core is already the newest version (4.2.3-1.2204.0).
r-base-core set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 75 not upgraded.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
python3-dbus is already the newest version (1.2.18-3build1).
python3-dbus set to manually installed.
python3-gi is already the newest version (3.42.1-0ubuntu1).
python3-gi set to manually installed.
Suggested packages:
  python3-apt-dbg python-apt-doc
The following packages will be upgraded:
  python3-apt
1 upgraded, 0 newly installed, 0 to remove and 74 not upgraded.
Need to get 164 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 python3-apt amd64 2.4.0ubuntu1 [164 kB]
Fetched 164 kB in 1s (324 kB/s)
(Reading database ... 27081 files and directories currently installed.)
Preparing to unpack .../python3-apt_2.4.0ubuntu1_amd64.deb ...
Unpacking python3-apt (2.4.0ubuntu1) over (2.3.0ubuntu2.1) ...
Setting up python3-apt (2.4.0ubuntu1) ...
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/bspm_0.5.1.tar.gz'
Content type 'application/x-gzip' length 25974 bytes (25 KB)
==================================================
downloaded 25 KB

* installing *source* package ‘bspm’ ...
** package ‘bspm’ successfully unpacked and MD5 sums checked
** using staged installation
* installing /usr/share/dbus-1/system-services/org.r_project.linux1.service
* installing /etc/dbus-1/system.d/org.r_project.linux1.conf
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (bspm)

The downloaded source packages are in
        ‘/tmp/RtmpJsEsZY/downloaded_packages’
    ```R

root@DESKTOP-NIQG5GO:/home/ubuntu# R

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle" Copyright (C) 2023 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

[Previously saved workspace restored]

rp <- c("https://rpolars.r-universe.dev/bin/linux/jammy/4.2", "https://cloud.r-project.org") install.packages(c("rpolars", "arrow"), repos = rp) Hit https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease Hit https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease Hit http://archive.ubuntu.com/ubuntu jammy InRelease Hit http://security.ubuntu.com/ubuntu jammy-security InRelease Hit http://archive.ubuntu.com/ubuntu jammy-updates InRelease Hit http://archive.ubuntu.com/ubuntu jammy-backports InRelease Ign https://r2u.stat.illinois.edu/ubuntu focal InRelease Ign https://r2u.stat.illinois.edu/ubuntu focal InRelease Ign https://r2u.stat.illinois.edu/ubuntu focal InRelease Err https://r2u.stat.illinois.edu/ubuntu focal InRelease Could not connect to r2u.stat.illinois.edu:443 (192.17.190.167). - connect (111: Connection refused) Fetched 0 B in 6s (0 B/s) Error:

eddelbuettel commented 1 year ago

That's a little long and difficult to read / debug. I have not tried wsl myself but friends have alpha/beta tested r2u from it a year ago. There should be no issue per se.

Eyeballing, I can spot one blunder: your system identifies as

 Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64)

which is 22.04 aka "jammy". But apparently the script you ran is full of focal. We have two scripts: one for focal, one for jammy. It looks like you picked the wrong one. Please try jammy on 22.04, ie this script. (Unless I messed up royally but I see no 'focal' reference in the 'jammy' script.)

You can also 'decompose' the steps in the script, there are only five and the website walks you through. Or if you want to try "right now" stick with the eddelbuettel/r2u:22.04 container (where either tag '22.04' or 'jammy' works).

As for the error the r2u.stat.illinois.edu host provides files via the standard http and https ports -- try https://r2u.stat.illinois.edu/ubuntu/ with or without the s. I'd be happy to hop on a zoom or jit.si call to see and debug. I am on Central time.

eddelbuettel commented 1 year ago

Ok, for good measure I did the following test (given that I updated / double-checked these scripts not so long ago):

It ran fine through all the steps but croaked at the final (very standard) Rscript -e 'install.packages("bspm")' with the same error as you showed: 'cannot open URL 'https://cloud.r-project.org/src/contrib/bspm_0.5.1.tar.gz'. So maybe something gets in the way of DNS. Hard to tell. I re-ran that step by hand, and then had a working system where install.packages("RcppArmadillo") did what is expected: install the package as binary along with sole depends Rcpp.

So maybe ... try again? And/or try the steps one by one?

sorhawell commented 1 year ago

I have tried to reset wsl and virtualbox ubuntu-jammy and run the steps. It will hang and I get various conenction refused / no response from addresses in the stat.illinois.edu domain.

pinging does not work either. Could the server be down? or have some very restrictive policies?

ping r2u.stat.illinois.edu
PING r2u.stat.illinois.edu (192.17.190.167): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
ping: sendto: No route to host
Request timeout for icmp_seq 4
ping: sendto: Host is down

example of step 2 failing on a fresh ubuntu-jammy via virtual box


root@ubuntu:/home/soren# echo "deb [arch=amd64] https://r2u.stat.illinois.edu/ubuntu jammy main" > /etc/apt/sources.list.d/cranapt.list

apt update -qq

122 packages can be upgraded. Run 'apt list --upgradable' to see them.

W: Failed to fetch https://r2u.stat.illinois.edu/ubuntu/dists/jammy/InRelease  Could not connect to [r2u.stat.illinois.edu:443](http://r2u.stat.illinois.edu:443/) (192.17.190.167), connection timed out

W: Some index files failed to download. They have been ignored, or old ones used instead.

root@ubuntu:/home/soren#
eddelbuettel commented 1 year ago

Very weird. The machine is clearly up all the time and accessed all the time.

sorhawell commented 1 year ago

I'm not sure either.

The following link https link from https://github.com/eddelbuettel/r2u README is unreachable for me. I also tried to sign up for a VPN in the US.

I have pinged random university domains and all reply except illinois.edu.

4 packets transmitted, 0 packets received, 100.0% packet loss
sorenwelling@Srens-MacBook-Pro ~ % ping yale.edu
PING yale.edu (151.101.194.133): 56 data bytes
64 bytes from 151.101.194.133: icmp_seq=0 ttl=57 time=10.083 ms

sorenwelling@Srens-MacBook-Pro ~ % ping ku.dk
PING ku.dk (130.226.237.173): 56 data bytes
64 bytes from 130.226.237.173: icmp_seq=0 ttl=244 time=10.947 ms

sorenwelling@Srens-MacBook-Pro ~ % ping ucla.edu
PING ucla.edu (3.33.167.235): 56 data bytes
64 bytes from 3.33.167.235: icmp_seq=0 ttl=120 time=9.185 ms

sorenwelling@Srens-MacBook-Pro ~ % ping illinois.edu
PING illinois.edu (192.17.172.3): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
ping: sendto: No route to host
Request timeout for icmp_seq 4

these connection testers says the domain is alright https://www.isitdownrightnow.com/illinois.edu.html

eddelbuettel commented 1 year ago

Please trust me when I say that r2u.stat.illinois.edu is up and ok -- I have a shell session open there essentially at all times. It has quasi-six-nines uptime: there in an automated system administration job once each month to install blessed updates and that is all. The machine is good. And I grep the access logs. It has served literally millions of .deb files. Lots of daily GitHub Actions run against as well as other services. When I last counted r2u logs two months ago we were at 2.5 million packages served in the then nine months.

Now, r2ut is a VM (but then so are a bazillion other webservices) and something must upset your machine. You can try the initial host -- my machine here is connected via fiber. Some people still hit it because they never updated their configs, it will work too.

I really do not understand what DNS or other issues could be at stake. Do keep our sanity, can you try from somewhere else ie do you have shell elsewhere?

Very lastly, are you using WSL or WSL-2? I do have (fairly enthusiastic) usage reports from WSL-2, but not from WSL.

vincentarelbundock commented 1 year ago

FWIW, I was able to install this without compiling anything on a WSL Ubuntu 22.04.

First I installed the latest bspm. Then called bspm::enable() and options(bspm.sudo=TRUE). Finally,

rp <- c("https://rpolars.r-universe.dev/bin/linux/jammy/4.2", "https://cloud.r-project.org")
install.packages(c("rpolars", "arrow"), repos = rp) 

image

Sorry about formatting. On my phone.

sorhawell commented 1 year ago

@vincentarelbundock thx for trying this

My best guess is that it is my physical windows machine be it via wsl, virtualbox or native windows. It is just not gonna connect to https://r2u.stat.illinois.edu/ or anything related in the illinois.edu domain.

the ping was likely an inconclusive test

via phone or macbook I can connect though. Gotta get a new machine and try again.

eddelbuettel commented 1 year ago

So strange -- illinois.edu is a fairly well-known and connected domain. I teach a course there as an adjunct, and have remote students too. But yes at this point trying from another machine sounds best. In other news I tweeted and tooted and example with the arrow + rpolars installation in seconds as binaries and folks are appearing to like it.

sorhawell commented 1 year ago

@eddelbuettel I would like to include an r2u suggestion in README.Rmd. What do you think of a wording like this?

r2u: Speeding up your workflow? Install rpolars + arrow from binaries and resolve system dependencies on ubuntu 22.04 with r2u (see link for configuration).

rp <- c("https://rpolars.r-universe.dev/bin/linux/jammy/4.2", "https://cloud.r-project.org")
install.packages(c("rpolars", "arrow"), repos = rp)
eddelbuettel commented 1 year ago

Short, concise, to the point. Maybe stress that (sadly) only on Ubuntu (which I'd capitalize too) so maybe

r2u: Speeding up your workflow? On Ubuntu, install rpolars + arrow from binaries and resolve
system dependencies reliably and quickly with r2u (see link for configuration).

rp <- c("https://rpolars.r-universe.dev/bin/linux/jammy/4.2", "https://cloud.r-project.org")
install.packages(c("rpolars", "arrow"), repos = rp)
grantmcdermott commented 1 year ago

Dirk, minor typo in your edit: "... dependencies on with..." (Need to drop the "on".)

In fact, I'd be tempted to switch the "on" with "automatically".

eddelbuettel commented 1 year ago

Yup, thanks -- fixed [and edited to 'reliably and quickly']. Did you test it, or did we loose you for good to unforsaken worlds of other linux distros?

Most is the most potent yet short wording?

grantmcdermott commented 1 year ago

Still Arch for me, but loving r2u for my Docker setups (as you know).

sorhawell commented 1 year ago

I'll merge in and close Monday :)

eitsupi commented 1 year ago

Closed by #122 and #123

sorhawell commented 1 year ago

@eddelbuettel

After reinstalling windows OS on two machines and connecting to r2u.stat.illinois.edu via two independent ISP/routers it seems windows machines and my house landline has some DNS/ssl issue with windows machines. I noticed later some other unrelated https connections which was also failing.

Connecting via hotspot WiFi works fine.

So the error was indeed on my side.

eddelbuettel commented 1 year ago

Thanks for confirming.

We are fortunate in that the server machine is indeed connected to the Internet2. And those machines do appear to be well reachable by many via fast and stable connected so it always seemed like a local (if "inexplicable") issue.

Let me know if I can help with anything else.