r-wasm / webr

The statistical language R compiled to WebAssembly via Emscripten, for use in web browsers and Node.
https://docs.r-wasm.org/webr/latest/
Other
804 stars 54 forks source link

Support `IDBFS` for persisting installed packages #442

Closed JosiahParry closed 2 weeks ago

JosiahParry commented 2 weeks ago

I am working through how to persist installed R packages in a manner that can be supported outside of the Node runtime.

The two options provided are WORKERFS and NODEFS. In my case, as I am using WebR via Rust & WASM, I cannot use the NODEFS.

The issue with WORKERFS is that the when a page is reloaded, the workers die and the persistence with it.

Alternatives would be to use the browser Filesystem API or IndexedDB. At present, Emscripten supports only IndexedDB via IDBFS.

Being able to support one of these would (I believe) allow me to persist R packages across page reloads.

georgestagg commented 1 week ago

In a543b3e I have added IDBFS to the R WebAssembly build and written wrappers for mounting IDBFS type filesystems. This should now be available with the builds of webR under /latest and will be included in the next tagged release.

Please read the additional IDBFS documentation carefully; it outlines how to use the IDBFS filesystem storage and some caveats associated with the method. Most importantly, be aware of the following limitations in this first implementation:

1) This filesystem type is only available in web browsers, not in Node.

2) The webR PostMessage communication channel must be used. If required, this can be forced by setting channelType: 3 in the webR options during startup.

2) The filesystem data is written to IndexedDB when requested using the Emscripten FS API function syncfs(). This function must first be called explicitly at mount time to populate the VFS from the persistent database, and then again whenever you want to persist data to the database.

The webR documentation goes into more detail and links to the Emscripten FS API documentation. Please let me know if anything needs to be clarified in the text.


For your particular use case the scheme would look something like this, in JavaScript:

// Create a `/data` directory for IDBFS and mount it
await webR.FS.mkdir('/data');
await webR.FS.mount('IDBFS', {}, '/data');

// Populate the `/data` directory from IndexedDB. The first time, this will be empty
await webR.FS.syncfs(true);

// Install R packages to `/data/library`
// NOTE: The `mount = FALSE` argument is very important here
await webR.evalRVoid("webr::install('dplyr', lib = '/data/library', mount = FALSE)")

// Synchronise to write the packages' file data to IndexedDB
await webR.FS.syncfs(false);

Then, after a web browser refresh:

// Create a `/data` directory for IDBFS and mount it
await webR.FS.mkdir('/data');
await webR.FS.mount('IDBFS', {}, '/data');

// Populate the `/data` directory from IndexedDB.
await webR.FS.syncfs(true);

// The previously persisted packages should now be available under `/data/library`

Once the VFS is synchronised as above, we have in R:

> list.files("/data")
[1] "library"
> .libPaths(c("/data/library", .libPaths()))
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Finally, I don't know what your Rust bindings look like but do be aware that webR.FS.syncfs() returns a Promise and files won't appear in the VFS populated from IndexedDB until this promise has been resolved.

I'm not sure what this will look like in Rust, but in JavaScript this means either awaiting the result, using the .then(() => {}) function, or just waiting until some future time such as with setTimeout(() => {}, 1000) (It is for exactly this reason that the PostMessage rather than SharedArrayBuffer communication channel must be used).