vega / vl-convert

Utilities for converting Vega-Lite specs from the command line and Python
BSD 3-Clause "New" or "Revised" License
89 stars 9 forks source link

Add online and offline html export support #118

Closed jonmmease closed 9 months ago

jonmmease commented 9 months ago

Closes https://github.com/vega/vl-convert/issues/33, cc @joelostblom

This PR adds support for converting Vega and Vega-Lite charts to live HTML documents. There is a "bundle" option that controls whether the JavaScript dependencies should be loaded from a CDN, or whether they should be inlined into the resulting HTML file.

bundle=False

When bundle is False, this follows the Vega Embed directions to load vega, vega-lite, and vega-embed from jsdelivr

bundle=True

When bundle is True, things are a bit more involved. We already inline Vega and several versions of Vega-Lite into the VlConvert executables, so I wanted to avoid including additional copies for the purpose of HTML export. But in order to use the JS deps inlined into VlConvert, they need to be bundled. I found that the deno_emit project provides a Rust crate that uses SWC to bundle JavaScript / TypeScript dependencies, and this ended up working well.

One note, the bundled code isn't fully minimized yet, but I have an open PR that will expose SWC's minify option. See https://github.com/denoland/deno_emit/pull/141. Currently, the resulting HTML files start at ~1.6MB, but this will drop to ~1MB when minification is enabled.

The bundling process takes about 1.5s on my machine. Because this is pretty slow, I decided to cache the bundle results, so that subsequent HTML exports that use the same Vega-Lite version will be fast (10-20ms).

Custom JavaScript bundles

Additionally, a javascipt_bundle Python function is added that can be used to create bundles with custom JavaScript logic that references Vega Embed, Vega, and Vega-Lite. The idea is that Vega-based systems like Altair can use this to build additional integrations.

Integrations

The most immediate application of the HTML export is to remove the altair_viewer dependency in Altair's html export when inline=True.

We could also use this to add an "html-offline" Altair renderer, though this could result in large notebooks as every individual Chart would be over 1MB.

Another use-case I have in mind is to use the javascript_bundle function to create offline bundles for Altair's JupyterChart. This is why I added support for the lodash debounce function as well, since this is the only import, in addition to vegaEmbed, that JupyterChart's JavaScript logic uses. The cool thing about this approach is that we can build the offline bundle on the fly (in under 2s) without an internet connection required.

joelostblom commented 9 months ago

This functionality is really neat! And I'm impressed with how fast vl-convert and vegafusion goes from discussion in issues to implementation =)

The most immediate application of the HTML export is to remove the altair_viewer dependency in Altair's html export when inline=True. We could also use this to add an "html-offline" Altair renderer, though this could result in large notebooks as every individual Chart would be over 1MB.

Thinking more about this passage and all the possible different use case scenarios for this functionality, this what I come up with so far:

  1. I want to make a single chart and share it with offline access. This is already covered by this PR in an effective way.
  2. I want to make a notebook with many offline charts and share it with someone. As you noted, this notebook size will become rather large if vega/vega-lite is bundled once per chart. Would it be possible to instead bundled it one per notebook and reference the location in the notebook where the spec is bundled from all the charts? I'm thinking something similar to how you can paste an image into the notebook and it gets bundled/embedded as a base64 image string in a notebook "attachment" and then that attachment ID can be referenced from any other cell to render the image there. Another approach might be to look at what panel does when they enable their
  3. I want to make either a notebook or a single chart with offline access, but I don't care about sharing the chart, I just need to be able to work myself offline. In this case, would it be possible to somehow not bundle at all and instead point to the local vl-convert installation and use vega/vega-lite from there instead of from the CDN? This would probably have to be exposed as a different local renderer or at least a parameter in the html-offline renderer you suggest to tell it apart from the above bundling scenarios. Maybe this could even be a default fallback mode for altair when there is no online connectivity to provide a seamless experience when going offline?
joelostblom commented 9 months ago

I thought more about point 3 above and I think the fallback option could be quite useful. That will guarantee that altair notebooks works as long as vl-convert is installed even without access to the internet, such as on a flight or something without changing anything in the notebook itself.

Also related, I know that if I load an altair chart once while I am online, I am then able to create additional charts in the same notebook even if I go offline as long as I don't restart my kernel (or was it reload jupyterlab?). Maybe there is something here with how vega/vega-lite is cached after it is loaded for one chart that could be used for the notebook offline bundling to prevent including the 1MB bundle in each chart (if the "attachment" approach I suggested above does not work).

jonmmease commented 9 months ago

would it be possible to somehow not bundle at all and instead point to the local vl-convert installation and use vega/vega-lite from there instead of from the CDN?

VlConvert doesn't exactly have a local installation of the Vega JS libraries. The minified source code of all of the individual files that make up these libraries are embedded in the VlConvert executable. Some form of bundling is required to make these suitable for use inside a <script> tag of an HTML document. So I think (2) and (3) are really the same challenge.

The trick we used with plotly, for the classic notebook, was to have an initialize step (pio.renderers.default = "notebook") load the bundled Plotly.js library into the notebook using ipython_display.display_html. Then individual chart displays would assume that Plotly was already available globally.

I don't recall why we didn't use this approach for JupyterLab, but it might be possible to do something similar in Altair when calling alt.renderers.enable('html-offline'). One caveat is that approach does not work for alternative notebook environments that isolate chart cell's using iframes (I think Colab does this).

joelostblom commented 9 months ago

Thanks for explaining and elaborating on how vl-convert works. I like the approach you mentioned plotly takes for bundling once in the notebook and if that works with jupyterlab as well, then I think it could be a viable approach. Isolated environment such as collab could still rely on approach 1 of bundling with each chart, we just have to make that explicit in the docs.

VlConvert doesn't exactly have a local installation of the Vega JS libraries. The minified source code of all of the individual files that make up these libraries are embedded in the VlConvert executable.

Does this mean that we need to be online in order to make a chart that works offline? Or is everything needed to create the bundle that goes in the offline chart already part of vl-convert? If it is the latter, then could we not somehow use that as a fallback rendering option? Another approach would be a a clear warning message to users when they fail to render a chart because they went offline, with instructions of how to enable an offline renderer. Having that said, my impressions from the issue list is not that this is a commonly reported hickup, so I guess this functionality is more of a nice to have.

jonmmease commented 9 months ago

Does this mean that we need to be online in order to make a chart that works offline? Or is everything needed to create the bundle that goes in the offline chart already part of vl-convert?

This PR adds the logic needed to create the bundle using the JS files that are embedded in the executable (this is what the deno_emit Rust crate does), so no internet connection is required.

If it is the latter, then could we not somehow use that as a fallback rendering option?

I'm not sure if we can auto-detect this. For security reasons, I don't think we want Altair attempting network requests to test whether it has internet access. Even if we did, this would only test that the kernel has internet access, not that the browser displaying the chart has internet access. So I would lean toward adding an offline rendering option and having docs and error messages direct people to this.