r-wasm / webr

The statistical language R compiled to WebAssembly via Emscripten, for use in web browsers and Node.
https://docs.r-wasm.org/webr/latest/
Other
848 stars 67 forks source link

Prevent fetching R.bin.data & R.bin.wasm etc on each page load #438

Closed JosiahParry closed 3 months ago

JosiahParry commented 3 months ago

I am working on Rust bindings to WebR https://github.com/JosiahParry/webr-js-rs. Since WebR cannot be built for WASI yet, this relies on loading webR from the .minjs.

However, what I've noticed is that the minjs loads the following files on each page load:

These result in 6.6mb of file downloads per page load:

file_sizes <- c("177KB", "4.1MB", "1.7MB", "597KB", "48.8KB", "631B")
sum(fs::as_fs_bytes(file_sizes))
#> 6.6M

which is quite a lot!

The ideal behavior I would like is to download these files just once and then cache them and use them.

However, I have pretty much no idea about how JavaScript bundling / packaging works.

Is there a way to build webR to accomplish this type of behavior?

JosiahParry commented 3 months ago

Following the trail:

JosiahParry commented 3 months ago

Looking into the headers of the requests the Cache-Control is not set. It would be great to be able to have the headers set so that subsequent requests can just use the cache. Or perhaps have a way to set the cache type upon load?

I think an issue that I may be running into is that webr loses its context on each page because it continually is reloading itself

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

georgestagg commented 3 months ago

The webR CDN assets under /latest/ are intentionally served with Cache-Control: no-cache so that the latest commit is always downloaded by the browser.

The longer-term builds under e.g. /v0.3.3/ are served with the HTTP header cache-control: max-age=604800, and so the webR assets should automatically be cached by browsers for 1 week. Please let me know if you do not see these headers under the fixed version CDN URLs.

Assets are also served compressed over the wire where the browser supports it (brotli, gzip), which should hopefully help.

P.S. This 6mb of download is already a pretty slim distribution of the R WebAssembly binary and the filesystem data required for starting up R and its default packages. We attempt to segment the virtual filesystem so that only the minimal required binaries and filesystem data are downloaded on initial load. You may notice that as you use R, further filesystem data is downloaded on demand. For example, loading a help file, or plotting an image with e.g png(), will initiate further download of assets.

JosiahParry commented 3 months ago

Thank you! I'll give this a look!

JosiahParry commented 3 months ago

Thank you for this! I've been able to swap out the /latest/ with v0.3.3 and the caching is now present. Also noticed I had this bad boi checked!

image

What I did notice is that even though these are cached, the R session will restart on a refresh. I will have to explore what this means for mounting data and installing packages.