Supported packages? - Githubissues

r-wasm / webr

The statistical language R compiled to WebAssembly via Emscripten, for use in web browsers and Node.

https://docs.r-wasm.org/webr/latest/

Other

868 stars 68 forks source link

Supported packages? #11

Closed ekianjo closed 1 year ago

ekianjo commented 2 years ago

hi @georgestagg I played around with the demo and it's fairly impressive to see R running entirely within the browser. I tried loading a few packages, like dplyr, ggplot and a few others, and it seems there is some limitation as to what packages are currently supported. How do packages support work exactly? Are they precompiled and therefore only a specific set is available? What is the work required to add new packages?

georgestagg commented 2 years ago

Yes that's right, the packages that work have been precompiled for WASM. Many R packages require a working C compiler to install from source, and I don't think it's going to be reasonably possible to have a full build toolchain working in the browser, so distributing pre-built WASM binaries is probably the next best thing.

It's not well documented, but there's a repo at https://github.com/georgestagg/webr-ports where I have been putting pre-compiled versions of R packages (you can see a list by looking in the dist directory). The webR REPL pulls the pre-compiled packages from this repo when you run library(...) and loads them into the Emscripten filesystem.

My own process for building R packages has been to install them from source into a new and empty R library path by using the R CMD INSTALL command within the docker environment used to build webR, so that access to the Emscripten/Dragonegg toolchain is available. Any libraries built during the install need to be compiled for WASM rather than natively, so I (ab-)use R's Makevars system to use emcc & em++ as the C compilers, along with passing some required arguments to tell Emscripten to build the libraries as SIDE_MODULEs.

The result is a directory containing a single installed R package, along with any libraries built for WASM. I then use Emscripten's file_packager to build .data and .js files containing the contents of that directory, ready to load into the Emscripten filesystem for webR.

I have written a script to take a URL pointing to an R source package and then built it using the process above, it's at https://github.com/georgestagg/webr-ports/blob/main/scripts/install_from_URL.sh. If you'd like to give it a go, you'll need to:

Build the webR docker environment using the Dockerfile from this repo.
Use docker to get a shell, such as bash, in the environment.
Build a native (rather than WASM) version of R and put it on the path, so that you can run R CMD.
Clone a copy of https://github.com/georgestagg/webr-ports somewhere.
Run the install_from_URL.sh script, passing a URL to an R source package as an argument.
Upload the resulting files now in the dist directory to webspace somewhere, and modify the webR REPL's package URL to point there.

Not all R packages are going to just compile cleanly like that, and for those packages install_from_URL.sh won't work, they'll instead need to be patched to build for WASM. See, for example, https://github.com/georgestagg/webr-ports/tree/main/src/svglite.

Hopefully in future this can all be streamlined, simplified and documented properly, but for the moment it at least seems to work. I've leave the issue open to remind myself that the process needs work.

SugarRayLua commented 2 years ago

@georgestagg,

Would you please clarify further how one can install packages from the R packages repo you discussed above. It was my understanding from reading what you wrote that if I try and type into the REPL in webR to install a package that you’ve put into the repo:

(e.g. install.packages(‘tidyr’))

that webR would default to try and pulling that package from your package repository. However, when I try typing the above command in the REPL, webR just seems to hang.

Fyi.

Thanks!

georgestagg commented 2 years ago

Running something like install.packages(‘tidyr’) should just work.

Tidyr was probably not the best example, though. At the moment loading packages is handled through hard coded lookups for what other packages are required and I had missed tidyr. I've updated the list so that running install.packages(‘tidyr’) should now automatically download it and its prerequisites into the environment:


> library(tidyr)
Downloading webR package: tidyr
Downloading webR package: dplyr
Downloading webR package: generics
Downloading webR package: glue
Downloading webR package: lifecycle
Downloading webR package: rlang
Downloading webR package: magrittr
Downloading webR package: pillar
Downloading webR package: cli
Downloading webR package: crayon
Downloading webR package: ellipsis
Downloading webR package: fansi
Downloading webR package: utf8
Downloading webR package: vctrs
Downloading webR package: R6
Downloading webR package: tibble
Downloading webR package: pkgconfig
Downloading webR package: tidyselect
Downloading webR package: purrr
>

That particular issue should go away once we've patched the internet R module and created a more CRAN like repository. Then we could load packages within webR itself and it will handle the dependencies better.

Saying all that, I'm unsure why it was hanging for you. Sometimes the browser will seem to hang for a little while when loading, this is a consequence of running webR on the main thread and should not become a problem when #17 is complete. The hang should not last too long, though. If it continues to happen, please check the browser console with dev tools (F12, usually) and let me know if there are any console error messages from a crashing webR.

lionel- commented 2 years ago

Regarding hangs, I've noticed that the current dev version of webR hangs if I run library(dplyr) but not library(tibble); library(dplyr). Hitting pause in the debugger shows it's stuck in dlmalloc(). This may be related to the memory issues I've encountered.

That said, the released webR REPL doesn't seem to suffer from this particular hang, so this may be something else.

SugarRayLua commented 2 years ago

Thanks, with the new release, I’m able to load up tidyr fine. I wasn’t aware of the need to clear my cache to reinstall a new release— maybe something helpful to put into the future readme file. I’m also a novice to web page work so didn’t know about developer tools. I enabled that option now on my Mac for Safari; seems a bit more of hassle (need to be hardwire iPad to Mac) to use developer tools on a webpage on iPad. However, I found a very useful free replacement app that seemed to accomplish the same thing “Web Inspector” of the Apple App Store.

Fyi, Mike

P.S. One future package would be interested to see how works in webR: Rgl Rgl has a webgl option and would be interested to see if that worked in webR

SugarRayLua commented 2 years ago

@georgestagg,

I was able to install emscripten as you suggested successfully on my MAC and created a separate folder on my Mac and complied webR in there mimicking the docker environment. I then tried to use your install_from_URL.sh script from the webR-ports site on a test package, and it seemed to initially compile that package well until it hit an error in the shell script: file_packager command not found

From reading the emscripten website, file_packager.py is supposed to be part of the emscripten sdk but wasn't in my emscripten directory for some reason. I therefore copied the raw file_packager.py code from the emscripten website and tried again to compile a test package from a URL with the install_from_URL shell script, but for some reason having that python script in the proper directory didn't fix the problem and let the compiler use the file_packager command. I tried just running the file_packer.py from python3, and the python script itself seemed to throw some errors (despite being copied directly from emscripten's website).

Any thoughts of what I might be doing run with compiling R package on my Mac for use by webR? I think I'm making progress and would appreciate any suggestions at your convenience.

Thanks, Mike

lionel- commented 2 years ago

We'll support building packages through https://github.com/lionel-/webr-repo/, but this may require a few changes to the webR build system before, which are planned soon. Currently busy with other things.

SugarRayLua commented 2 years ago

Okay, thanks for letting me know, @lionel-.' Will look forward to those changes.

Sincerely, Mike

SugarRayLua commented 2 years ago

@georgestagg and @lionel- ,

I was able to get emscripten working on my Mac and was able to use the install_from_URL.sh script to create two .js files in the dist directory on my computer as I believe George described in the steps earlier. The package I chose to try and build for webR was the "gridExtra" package. I had hoped that all I would need to do would be to use the webR "upload file" option to the emscripten file system and upload those two packages into the same library path as the other pre-installed packages (e.g. .libPaths() = "/usr/lib/R/library" referencing the emscripten file system on webR). However, it didn't seem that simple-- webR didn't recognize the gridExtra package despite making a gridExtra named folder in the appropriate /usr/lib/R/library path and putting the two gridExtra .js files that I created into it). I re-read George's instruction above and it seems that instead the "library()" function on webR only looks for webR pre-installed packages or packages stored at the webR-ports repo. It looked like George suggested if I wanted to try out package I tried to build for webR using the install_from_URL.sh script that I would need to go into the webR.js file and change the package URL variable there to point to where I stored the newly created package. Unfortunately, as a novice programmer, I don't yet understand how to actually modify the webR.js file on my computer/browser nor how to put the newly created package at a URL location to point the modified webR.js file to.

From my reading it looks like browser can point to files in addition to to web address. Thus, could I instead put the file path for the new packages I built on my local computer and change the webR.js file to point to that file path instead of a location on the internet (i.e. URL = "file:///Users/name/Desktop/gridExtra" if on desktop or perhaps even URL = "file:///usr/lib/R/library" to emscripten directory on webR itself)? If so, if you could explain to me how to actually change the webR.js file loaded on my machine (desktop or iPad), I would point webR.js to those locations and test out the new package (gridExtra).

My only other concern is that when I look in the directories of the webR-port packages, they seem to have multiple folders in them-- not solely a package.js and data.js package that install_from_URL.sh produces. Is everything needed to have webR use the newly build packages in the newly created package.js and data.js files created from install_from_URL.sh?

If these are things I can learn to do as a novice, I'm glad to help out and do the legwork in trying to build multiple R packages with the install_from_URL.sh script and test them out in webR and let you know how they work and send the ones to you that do work.

Please let me know at your convenience.

Here is a link to the gridExtra files I created on my Mac from the install_from_URL.sh script (untested for the reasons mentioned above):

https://www.dropbox.com/sh/lca00logk8imznl/AABVrb7gE5bvtTVX3iJGDTZoa?dl=0

Thanks, Mike

SugarRayLua commented 2 years ago

@georgestagg & @lionel- ,

I figured out how to create a simple webserver using python's http.server on my Mac and iPad so that I could continue to try and install and test out new R packages for webR. As a novice web developer, I didn't previously realize that my web browser client isn't allowed to fetch new R packages that I've built and stored on my local computer and that I need to fetch those new packages from a server somewhere (which I think is what you meant by your instructions above to point webR.js line to "web space" somewhere.

The neat thing is that once I figured out how to do this, I realized that the built webR distribution is all javascript, html and web assembly so can be easily transferred to other servers to run it and does not need to be rebuilt. Here's the steps I took in case other novice web developers want to try installing webR and try adding new R packages to webR on their system:

Clone and build webR from https://github.com/georgestagg/webR as indicated in the instructions above. Doing so will create the key "dist" distribution directory which contains all one needs to run webR from a server
Clone and build the webR-ports project from https://github.com/georgestagg/webr-ports as indicated in the instructions in this issue thread above. Doing so I believe generates another "dist" distribution directory with all of the R package ports the webR team has built for webR. That distribution is also portable and so I renamed the directory "pkgdist" and copied into the "dist" directory from the first step above so that I can move the entire webR "dist" directory to my server to host both webR and the webR extra packages from one server.
Run the install.fromURL.sh shell script from the build webR-ports project built in the step above (using the URL to the CRAN R package you are interested in building) to attempt to build new R packages to run in webR. @georgestagg indicated that some R packages won't easily port to wasm from that script. However, my first port was successful (package "gridExtra") which was encouraging. The newly built R packages also appear in the webR ports project "dist" directory so I just kept it there and move it en bloc with the other built packages to my web server directory.
Finally, as @georgestagg mentioned in his instructions earlier in this thread, you need to edit some of the lines in the webR.js file located in your "dist" directory to tell webR how to find your newly added package. Specifically, for my build: Line 6: WEB.URL became = "./" for my build to tell I believe webR that the webR files are now local to my server Line 7: PKG.URL became = the web location that the new packages are stored at. For my setup, I'm testing all of the new package ports on my local machine so I've got a server running on my local machine and having my local machines browser fetch the information from my local server (i.e. PKG.URL = "http://localhost:8080/pkgdist/" Lines 17 and up: Need to add the new package to the list of ported R packages available to install (e.g. add: 'NewPackageName' = ['dependencyPackageName1', 'dependencyPackageName2', etc.],

Once I did all of those steps, I just needed to put the webR "dist" directory from step 1 (which also contained the "pkgdist" subdirectory from step 2) into the default directory for my webserver (e.g. for Debian linux: /var/www/html), start my local webserver (e.g. python3 -m http.server 8080), and then connect from my local machines web browser to my local webserver via local host (e.g. in the URL bar of Safari: http://localhost:8080/) and it worked! (I did need additionally need to chmod grant permissions to execute all of the "dist" files when running the server from Linux although didn't need to do that from Terminal on my Mac).

After testing out this process on my Mac (and porting the new 'gridExtra' R package to webR on my Mac running a local server), I transferred the "dist" directory I had built to my iPad which also worked well and ran webR and the new 'gridExtra' R package I ported (I run Debian Linux on my iPad Pro using the sideloaded UTM app [QEMU port] which I used as my server, and then connected to that server with native Mobile Safari using "split screen" mode. I've attached a snapshot of what that looks like running on my iPad.

Notably, I can run R on Linux via UTM on my iPad; however, my interest in webR is that it runs at native speed (or almost-- using wasm) which is much faster than R on Linux through UTM's QEMU emulator.

I'll continue now to try and test and port other R packages to webR and would be happy to let the webR team know which ones ported well.

Hope this is helpful. Have a good weekend :-)[](url)

SugarRayLua commented 2 years ago

PS, Even simpler setup running completely on the iPad:

a-shell app (downloadable from the AppStore) running python3 http.server as server with webR .js and .wasm files installed then accessing the python server from localhost on Safari from the same iPad:

gregvolny commented 2 years ago

Dear All, mainly @georgestagg & @lionel- , Thanks for this very great innovation for all R-Users, However, I'm unable to run any R packages using the latest dist generated in Github (webr-dist). Please take a look in this screen and let me know how to install packages and test some R libs. among others, to connect to Databases (https://dbi.r-dbi.org/) packages.

Can we have persistent storage from the Emscripten virtual file system?

Screen1 Please let me know. Thanks in advance!

georgestagg commented 2 years ago

However, I'm unable to run any R packages using the latest dist generated in Github (webr-dist).

We lost package loading in a recent refactor, sorry about that. It will be coming back soon, certainly before the first versioned release.

Can we have persistent storage from the Emscripten virtual file system?

Not right now, but I believe it is possible to persist storage using Emscripten's support for IDBFS. Perhaps in the future a IDBFS mount point can be setup somewhere in the virtual filesystem. I will open an issue to remind us to look into it.

gregvolny commented 2 years ago

@georgestagg and @lionel- Thanks a lot for this very revolutionary release. Here's a screen of my first test via CSPro (https://www.census.gov/data/software/cspro.html). Screen2 However, since also i'm a new R User, i would like to know if you already have or you will implement some kinds of bidrectional JS-R API where we can call directly R functions inside JS? If yes, can you share with us some docs? Thanks in advance!

gregvolny commented 2 years ago

@georgestagg and @lionel- I'm trying to implement this: https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable.html however, WebR was unable to install the crosstable lib. Screen3 Same also with library(tidyverse) Please do you have some workaround for? Thanks in advance for your help and support!

SugarRayLua commented 2 years ago

@georgestagg, follow-up question on the "install_from_URL.sh" file regarding the Makevars file it creates

I was able to get the original docker file from the original repository working and tried building some more webR packages with the install_from_URL.sh. One in particular I've been trying to build is the R curl package as it is needed by the R jsonlite package to be able to access web APIs. However, I was initially puzzled due to the installation failing with the R curl package configure file giving me an error after decompressing the R curl package that it could not find libcurl or its headers despite my double checking that I appropriately had libcurl and its headers installed. I also verified that there wasn't a problem with the R curl package itself as I successfully installed it on my Mac without emscription by using solely:

R CMD INSTALL httppathtoRCurlzipsourcefile

I then looked further into the the actual R curl package's .configure file and noted that it finds the compiler and flags by these statements:

CC=`${R_HOME}/bin/R CMD config CC`
CFLAGS=`${R_HOME}/bin/R CMD config CFLAGS'
CPPFLAGS=`${R_HOME}/bin/r CMD config CPPFLAGS'

The R curl package .configure file then goes on to test the configuration by the following:

Echo "#include $PKG_TEST_HEADER" | ${CC} ${CPPFLAGS} ${PKG_CFLAGS} ${CFLAGS} -E -xc - >/dev/null 2>&1 || R_CONFIG_ERROR=1;

The next line then goes on to print the following error if $R_CONFIG_ERROR=1:

If [ $R_CONFIG_ERROR ]; then
     echo "--------------------ANTICONFIG ERROR---------------------------------------------"
     echo "Configuration failed because $PKG_CONFIG_NAME was not found. Try installing:"
     echo " * deb: $PKG_DEB_NAME (Debian, Unbuntu, etc)"
     echo " * rpm: $PKG_RPM_NAME (Fedora, CentOS, RHEL)"
     echo "IF $PKG_CONFIG_NAME is already installed, check that 'pkg-config' is in your"
     echo "PATH and PKG_CONFIG_PATH contains a $PKG_CONFIG_NAME.pc file. If pkg-config"
     echo "is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:"
     echo "R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=....'"
     echo "----------------------------------------------------------------------------------------
     exit1;
fi

Since when I ran the install_from_URL.sh script form the webR-ports I cloned into the prior repository's docker image, it appropriately identified the libcurl dir and its headers directory before displaying the above error, I suspect that I'm actually seeing that error not because of the R curl package configure script finding libcurl or its headers but because of a problem with the R curl package configure script finding and/or using the emscripten compiler or its flags which are tested in the R curl package configure scripts test statement that I listed previously. In fact, when I use a standard

R CMD INSTALL

of the R curl source package from my Mac desktop without emscripten or the install_from_URL.sh script, it works, and I'm able to echo display all of the required environmental variables (i.e.:

${CC}, ${CFLAGS}, ${CPPFLAGS}

However, when I echo display those same variables in the docker environment in which I run the install_from_URL.sh, I noticed that the:

${CPPFLAGS}

variable isn't set to anything. When I then open and read the install_from_URL.sh script, I notice that the install_from_URL.sh script itself sets environmental variables as follows:

CC=emcc
CXX=em++
CFLAGS=-std=gnu11 -I${RHOME}/include
CXXFLAGS=-std=gnu+11 -DRCPP_DEMANGLER_ENABLED=0 -D__STRICT_ANSI__ -I${RHOME}/include

But CPPFLAGS does not seem to be set by the install_from_URL.sh script

Also no

${RHOME}

environmental variable seems to be set in the docker image; thus I suspect that CFLAGS would also not get set with the script either and might cause the .configure file in the R curl package to trigger and error and the build to fail. I suspected that

${RHOME}

was intended to be the same as the environmental variable

${R_HOME}

and thus in the docker environment did:

export R_HOME=theRHomeDirectoryDeterminedFromR.home (e.g. /usr/lib/R)

Then added to the install_from_URL.sh script:

RHOME=${R_HOME}

However, that did not work either. Thus, in order to try and compile the R curl package to .wasm with the install_from_URL.sh script:

1a) Do I need to add a line in the install_from_URL.sh to set CPPFLAGS to something?

1b) If so, what would I set CPPFLAGS to?

2a) Do I need to change what CFLAGGS is set for in install_from_URL.sh? In the docker setup the statement CFLAGS=-std=gnu11 -I${RHOME}/include would point to the include path of /usr/lib/R/include which doesn't exist in the R environment I installed in the docker environment. I noticed that R CMD config CFLAGS command on my desktop (non-Emscripten setup) points to a non-R directory file: -I/usr/local/include

I hope this post was an appropriate issue to raise and ask and as a novice Bash user, I did my due diligence in attempting to find my own solution between posting. If you and the other developers plan to continue to use a similar installation script for R packages in the new repository, troubleshooting this issue might help future successful R package compilation to wasm.

Thanks!

georgestagg commented 2 years ago

@SugarRayLua

Firstly, I should mention that install_from_URL.sh is definitely unsupported at this point. We have moved to providing binaries via a CRAN like repo, with build scripts in the webr-repo GitHub repo listed above in a previous comment. Although I will admit that things are currently a little out of date there, we have been busy with other things.

While install_from_URL.sh might still work for some packages, it's a very ad-hoc script and would probably require tweaking in some way or another for any fairly complex R package. Going forward, we'll be updating the individual R packages to support building for WASM (where possible), rather than storing patches and maintaining install_from_URL.sh.

Now, regarding the R curl package. The fundamental issue is that the package links to the libcurl library to work, but you cannot simply link to the libcurl installed on your machine with the Emscripten compiler, because it has not itself been compiled for WASM. In fact, all the system prerequisites for a package need to also have been compiled for WASM. Emscripten provides several of these libraries for us, but as far as I am aware not libcurl.

In fact, it is my understanding that under the current WebAssembly specification the libcurl library can probably never be compiled directly for WASM, because the browser WASM runtime has no direct access to network sockets. There are tricks one can do with Emscripten, but they require communicating over websockets rather than standard HTTP, or external tools such as proxy servers.

One possible solution is to write an entirely new libcurl shim that converts libcurl API calls to browser based Fetch or XHR requests. But as I am sure you appreciate, such a project would be a big undertaking just on its own.

Without a version of libcurl for WASM, the R curl package cannot be compiled.

SugarRayLua commented 2 years ago

Thank you for explaining that, @georgestagg, that all makes sense. As I novice, I misunderstand that basic principle that one must link libraries already compiled to WASM and therefore is not as simple as just linking to libraries already installed on one's machine. It thus seems that the process of compiling R packages to WASM with install_from_URL.sh is not something most novices will be successful at, and I will patiently keep an eye out for new packages on the GitHub repo when you are able to develop and release them.

georgestagg commented 1 year ago

I'm going to close this, since the original question has been answered. For anyone dropping by, you can view the currently supported packages in webR using the command,

available.packages(repos="https://repo.r-wasm.org/", type="source")

Further information about the webR package build process will be added in future to the documentation, once the process is a little more developed and stable.

SugarRayLua commented 1 year ago

@georgestagg, A year ago you mentioned you didn't think it was feasible for curl to come to WebR, partly because WASM has no direct access to network sockets. However, I noticed that baseR's download.file() works fine in WebR (I can download files into WebR fine). Does download.file() not also need access to network sockets?

Appreciate your thoughts.

georgestagg commented 1 year ago

For webR, R itself has been patched to use JavaScript XHR for download.file() and a few other functions. By replacing the calls that would usually use network sockets with Browser API calls, we work around the limitation.

However, the Browser API only provides a small subset of the capabilities of the curl project, so using the same trick largely does not work. At most only a partial implementation of curl could be made.

See https://github.com/posit-dev/r-shinylive/issues/31#issuecomment-1786739311 for a comment I've written with a bit more detail.

SugarRayLua commented 1 year ago

Thanks, @georgestagg, for explaining-- that makes sense. I asked because one of the brilliant developers in our iPad Codea lua community figured out how to make a WebR - JS- Codea lua bridge that works well on the iPad 😊 (see attached screenshot). Such a bridge allows novice users like myself to combine the statistical and plotting power of R with the ease of scripting/other GUI/graphic creation in lua on a mobile device, and I'm attempting to make small useful projects that involve analyzing downloaded datasets.

SugarRayLua commented 1 year ago