Open pawelru opened 1 month ago
Currently some R packages for Wasm are very large. This is the result of a design decision introduced in webR in the interest of improving loading times for most (much smaller!) packages, at the cost of storing uncompressed package data. I'm currently thinking about how to improve the situation for packages that do compress particularly well, such as BH. The issue tracking that is at https://github.com/r-wasm/webr/pull/460
For now, it looks like these assets are too large for GitHub, and we don't currently have a mechanism to handle that. You unfortunately cannot simply delete the .data
file, the app will expect to find it and crash.
When exporting apps using the shinylive R package, you can export the app without bundling packages:
shinylive::export("myapp", "site", wasm_packages = FALSE)
This means the app won't ship with any bundled WebAssembly R package binaries. This would solve the problem in the short term, but unfortunately I can't see a simple way to export with this option set from Quarto documents.
So, there are several things we need to do here:
1) Handle packages too large for GitHub in some way: https://github.com/posit-dev/r-shinylive/issues/112
2) Make wasm_packages = FALSE
available from Quarto: https://github.com/quarto-ext/shinylive/issues/60
4) Better handle missing .data
files as a Shinylive app starts: https://github.com/posit-dev/shinylive/issues/163
3) (Future) Compress large Wasm package binaries in webR.
Thank you for a very detailed explanation and transforming this into more actionable backlog items. I'm looking forward for all of them, especially the one in shinylive
Quarto extension because that's the interface I'm interacting with.
I have read the source code a little, did some reverse engineering and came up with the following:
packages_path <- sprintf("_site/site_libs/quarto-contrib/shinylive-%s/shinylive/webr/packages", shinylive::assets_version())
# remove the dirs with size >= 100 MB
for (x in list.dirs(packages_path)) {
x_files <- file.info(list.files(x, full.names = TRUE))
if (any(x_files$size > 100 * 1024^2)) {
print(x)
unlink(x, recursive = TRUE)
}
}
# refresh the `metadata.rds` file
metadata_path <- file.path(packages_path, "metadata.rds")
metadata <- readRDS(metadata_path)
new_metadata <- metadata[intersect(names(metadata), list.dirs(packages_path, full.names = FALSE))]
saveRDS(new_metadata, metadata_path)
This will look into the package
directory and delete a package dir if any of the child files exceeds 100MB. Then it drops the entries from metadata.rds
file for consistency.
This way I was able to deploy and (looking briefly) everything looks fine. It might be because BH
-dependent functionality is not used and all (note it's an indirect dependency). It might be worse for directly dependent packages - this I haven't tested.
Sharing this to whoever will encounter a similar issue unless more elegant will be available (see above).
Yes, that should work OK as long as there are no entires in metadata.rds
without the corresponding .data
assets available. Saying that, keep in mind that metadata.rds
is intended to be an internal structure, and so there's no guarantee we won't change it going forward.
If you find you need BH
, you should be able to install it at runtime with install.packages("BH")
. Without the bundled asset, webR will download it from the public webR package repo instead.
BH
is listed as a LinkingTo
package, and I think in the specific case of BH
, it is used for header files at compile-time and is not actually needed at run time. (I don't know if that is true in general for all LinkingTo
packages, though).
I don't understand why the package is so large, though. On CRAN, the source package is about 13MB, the Mac binary package is about 12MB, and the Windows binary package is about 20MB.
@georgestagg Would it be possible to special-case BH
so that if it's only in the LinkingTo
section, webR won't try to bundle it? It would be good to check with the tidyverse team to see if this is a safe strategy, and if it could be applied in general for LinkingTo
packages, or at least to some specific packages.
I don't understand why the package is so large
It contains a copy of Boost, which is gigantic. Since it's just a bunch of text in C++ template files it compresses really, really well though. GitHub Pages won't compress .data
files over the wire (ðŸ˜), so I want to re-enable compression for webR packages. But it requires some thought to keep things snappy (i.e. avoiding R's built-in decompression routines).
I think you're right. IIUC LinkingTo
is specifially for packages required at build time but not runtime. The configuration for webR should be tweaked to ignore LinkingTo
during package dependency resolution, and similar for the r-shinylive
/renv
/pkgdepends
logic that resolves app dependencies. I'll check with the r-lib team first, though.
Would it be possible to special-case BH so that if it's only in the LinkingTo section, webR won't try to bundle it?
In addition to the other work to avoid bundling and downloading packages only in the LinkingTo
section, I've also uploaded a special version of BH to the webR public wasm package repo with the include/boost
directory removed. I don't believe anyone will be negatively affected -- the directory includes only header files, which can't be used under WebAssembly anyway. The package is now just a few kb in size.
With this, even older shinylive deployments that request BH
should benefit by having a much smaller download footprint.
BH
will eventually be replaced when a new version is released on CRAN, but by that point the issues linked above will be deployed and it will no longer do the same damage.
Hello.
I have encountered a following problem when trying to deploy by pushing changes to
gh_pages
branch:BH
is needed foranytime
linkanytime
is needed forShinyWidgets
linkThis is an indirect dependency of the app code so I can't really control this.
Can you please advise what can be done with it? Can we safely remove a package and expect it to be downloaded on the client side? How to do this?