quarto-ext / shinylive

Quarto extension to embed Shinylive for Python applications
https://quarto-ext.github.io/shinylive/
MIT License
141 stars 8 forks source link

Cannot deploy due to `BH_1.84.0-0.data` file size #59

Open pawelru opened 1 month ago

pawelru commented 1 month ago

Hello.

I have encountered a following problem when trying to deploy by pushing changes to gh_pages branch:

remote: error: File stable/site_libs/quarto-contrib/shinylive-0.5.0/shinylive/webr/packages/BH/BH_1.84.0-0.data is 121.02 MB; this exceeds GitHub's file size limit of 100.00 MB

BH is needed for anytime link anytime is needed for ShinyWidgets link

This is an indirect dependency of the app code so I can't really control this.

Can you please advise what can be done with it? Can we safely remove a package and expect it to be downloaded on the client side? How to do this?

georgestagg commented 1 month ago

Currently some R packages for Wasm are very large. This is the result of a design decision introduced in webR in the interest of improving loading times for most (much smaller!) packages, at the cost of storing uncompressed package data. I'm currently thinking about how to improve the situation for packages that do compress particularly well, such as BH. The issue tracking that is at https://github.com/r-wasm/webr/pull/460

For now, it looks like these assets are too large for GitHub, and we don't currently have a mechanism to handle that. You unfortunately cannot simply delete the .data file, the app will expect to find it and crash.

When exporting apps using the shinylive R package, you can export the app without bundling packages:

shinylive::export("myapp", "site", wasm_packages = FALSE)

This means the app won't ship with any bundled WebAssembly R package binaries. This would solve the problem in the short term, but unfortunately I can't see a simple way to export with this option set from Quarto documents.

So, there are several things we need to do here:

1) Handle packages too large for GitHub in some way: https://github.com/posit-dev/r-shinylive/issues/112

2) Make wasm_packages = FALSE available from Quarto: https://github.com/quarto-ext/shinylive/issues/60

4) Better handle missing .data files as a Shinylive app starts: https://github.com/posit-dev/shinylive/issues/163

3) (Future) Compress large Wasm package binaries in webR.

pawelru commented 1 month ago

Thank you for a very detailed explanation and transforming this into more actionable backlog items. I'm looking forward for all of them, especially the one in shinylive Quarto extension because that's the interface I'm interacting with.

I have read the source code a little, did some reverse engineering and came up with the following:

packages_path <- sprintf("_site/site_libs/quarto-contrib/shinylive-%s/shinylive/webr/packages", shinylive::assets_version())

# remove the dirs with size >= 100 MB
for (x in list.dirs(packages_path)) {
    x_files <- file.info(list.files(x, full.names = TRUE))
    if (any(x_files$size > 100 * 1024^2)) {
        print(x)
        unlink(x, recursive = TRUE)
    }
}

# refresh the `metadata.rds` file
metadata_path <- file.path(packages_path, "metadata.rds")
metadata <- readRDS(metadata_path)
new_metadata <- metadata[intersect(names(metadata), list.dirs(packages_path, full.names = FALSE))]
saveRDS(new_metadata, metadata_path)

This will look into the package directory and delete a package dir if any of the child files exceeds 100MB. Then it drops the entries from metadata.rds file for consistency. This way I was able to deploy and (looking briefly) everything looks fine. It might be because BH-dependent functionality is not used and all (note it's an indirect dependency). It might be worse for directly dependent packages - this I haven't tested. Sharing this to whoever will encounter a similar issue unless more elegant will be available (see above).

georgestagg commented 1 month ago

Yes, that should work OK as long as there are no entires in metadata.rds without the corresponding .data assets available. Saying that, keep in mind that metadata.rds is intended to be an internal structure, and so there's no guarantee we won't change it going forward.

If you find you need BH, you should be able to install it at runtime with install.packages("BH"). Without the bundled asset, webR will download it from the public webR package repo instead.

wch commented 1 month ago

BH is listed as a LinkingTo package, and I think in the specific case of BH, it is used for header files at compile-time and is not actually needed at run time. (I don't know if that is true in general for all LinkingTo packages, though).

I don't understand why the package is so large, though. On CRAN, the source package is about 13MB, the Mac binary package is about 12MB, and the Windows binary package is about 20MB.

@georgestagg Would it be possible to special-case BH so that if it's only in the LinkingTo section, webR won't try to bundle it? It would be good to check with the tidyverse team to see if this is a safe strategy, and if it could be applied in general for LinkingTo packages, or at least to some specific packages.

georgestagg commented 1 month ago

I don't understand why the package is so large

It contains a copy of Boost, which is gigantic. Since it's just a bunch of text in C++ template files it compresses really, really well though. GitHub Pages won't compress .data files over the wire (😭), so I want to re-enable compression for webR packages. But it requires some thought to keep things snappy (i.e. avoiding R's built-in decompression routines).

I think you're right. IIUC LinkingTo is specifially for packages required at build time but not runtime. The configuration for webR should be tweaked to ignore LinkingTo during package dependency resolution, and similar for the r-shinylive/renv/pkgdepends logic that resolves app dependencies. I'll check with the r-lib team first, though.

georgestagg commented 1 month ago

Would it be possible to special-case BH so that if it's only in the LinkingTo section, webR won't try to bundle it?

In addition to the other work to avoid bundling and downloading packages only in the LinkingTo section, I've also uploaded a special version of BH to the webR public wasm package repo with the include/boost directory removed. I don't believe anyone will be negatively affected -- the directory includes only header files, which can't be used under WebAssembly anyway. The package is now just a few kb in size.

With this, even older shinylive deployments that request BH should benefit by having a much smaller download footprint.

BH will eventually be replaced when a new version is released on CRAN, but by that point the issues linked above will be deployed and it will no longer do the same damage.