marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
6.79k stars 239 forks source link

Unable to view HTML exported notebook when offline #575

Open ForceBru opened 9 months ago

ForceBru commented 9 months ago

Describe the bug

How to reproduce

  1. Create an empty notebook: marimo edit random_notebook.py. Fill it with random code like print("Hello"), execute & save.
  2. Export the notebook as HTML.
  3. Open the HTML file in the browser. I see the notebook, all fine.
  4. Disconnect from the Internet, clear/disable browser caches, reload the HTML file. The point is to make sure nothing is loaded from caches.
  5. All I see is a white page and errors like "couldn't load resource because no Internet connection" in the console. I suppose it's because the exported notebook can't load stuff like <script type="module" crossorigin="anonymous" src="https://cdn.jsdelivr.net/npm/@marimo-team/frontend@0.1.77/dist/assets/index-z8usByTF.js"> since I'm offline and disabled caches.

Why I think this is not good

IMO, exported notebooks should be more accessible than the editable notebook that pops up with marimo edit. One should be able to read it almost anywhere, like a PDF document or an image. Especially given that the exported notebook is a static HTML file with mostly text (my code) - I should be able to view it offline.

Environment

{
  "marimo": "0.1.77",
  "OS": "Darwin",
  "OS Version": "22.6.0",
  "Processor": "i386",
  "Python Version": "3.12.1",
  "Binaries": {
    "Chrome": "--",
    "Node": "--"
  },
  "Requirements": {
    "black": "23.12.1",
    "click": "8.1.7",
    "jedi": "0.19.1",
    "pymdown-extensions": "10.7",
    "tomlkit": "0.12.3",
    "tornado": "6.4",
    "typing_extensions": "4.9.0"
  }
}

Code to reproduce

Code doesn't matter, any code will do.

mscolnick commented 9 months ago

I totally see the use case. Couple reasons that could make this difficult (mostly thinking out loud)

  1. Inlining the JS could get expensive and large. Right now we lazy-load some less-used plugins, so the single JS file you see there may actually end up loading many more (50+ JS files, some very small). It's not trivial to know which ones to include.
  2. We also load fonts from a CDN. These would not work offline either. Some fonts have fallbacks (basic text, markdown), but others don't like LaTex. It is not super trivial inlining these.

We have PNG export. Would a good PDF export satisfy your needs? Or would you like some interactivity still (e.g. hiding code, scrolling tables)?


Some strawman options that have trade-offs:

  1. PDF export (no interactivity)
  2. Include JS/css/fonts in another folder (this is what browsers do when you download a page). But now you have a folder to pass around.

Going to update from a bug to enhancement - I see this more as "static html offline support"

ForceBru commented 9 months ago

Would a good PDF export satisfy your needs?

Yep, I think having a static export would be useful, especially for shareability, as stated in marimo's goals in the README. Just send someone a PDF and they can view it on a potato.

Or would you like some interactivity still (e.g. hiding code, scrolling tables)?

Personally, I don't see the need for interactivity in an exported notebook. In an app - sure, in a notebook I'm currently running and editing - definitely. In a raw HTML file? Hiding code could be nice, maybe.

I think a simple (!) HTML file is already fine for static export:

  1. HTML is ubiquitous and even washing machines probably have web browsers by now, so everyone can view it.
  2. HTML is human-readable, so if someone doesn't have a browser for some reason, they still can read the markup, and thus the Python code (if it's included raw and not base64-encoded, like in Pluto).
  3. HTML is structured, so in case GitHub goes up in flames or something, people would be able to parse their marimo notebooks and at least extract the code.
  4. Images (and thus plots) can be included straight into the HTML <img> tag using base64. Also <svg>.
  5. Of course, basic styles and interactivity can be added as a kind of "runtime" with simple CSS and JS.

As for inlining LaTeX, I found this just now: https://dev.to/uetchy/math-api-latex-math-as-svg-image-m4p. Apparently, it converts LaTeX markup to SVG images, so one can just put them into the HTML. Looks like SVG has been supported by browsers since at least 2012, which should increase shareability/viewability even further (that is, probably everyone can view SVG). That project is opensource, so in theory, marimo could do something like this during export:

  1. Given a LaTeX equation, convert it to an SVG with this.
  2. Embed the SVG directly into the HTML.
  3. Goto next equation.

Unfortunately, that's just a theory...

jeffmelville commented 9 months ago

+1. Offline static HTML is part of my workflow with Jupyter that I'd love to see in Marimo to fully jump ship.

One thing I've been using recently that wouldn't be satisfied with a PDF is embedding interactive plotly plots to the notebook. With some hoops those survive the export to static HTML.

mscolnick commented 5 months ago

@jeffmelville, @ForceBru We have a way to HTML export now from the CLI: marimo export html notebook.py -o notebook.html. Could you use some other CLI tool (e.g. [pywebcopy](https://pypi.org/project/pywebcopy/, or plain wget) in order to make the html file offline?

ForceBru commented 5 months ago

TL;DR:


use some other CLI tool ... in order to make the html file offline

Not sure how that'd work? The PyWebCopy documentation says that "PyWebCopy can "crawl" an entire website and download everything it sees in an effort to create a reasonable facsimile of the source website". Does it mean I should run PyWebCopy on the HTML generated by marimo? I guess that'd probably work, but why? This seems unnecessarily complicated. However, I likely don't know enough about webdev to understand what's complicated and what isn't. I really want to tinker with marimo and build a prototype HTML export tool that'd export to the simplest HTML possible that'd work without JavaScript on any potato.

Here's my basic plan if I ever have the time:

  1. I assume marimo has a data structure that holds the full contents of the current notebook, so that the HTML generator can traverse it and extract cells and their contents.
  2. Each marimo notebook becomes a table with one column.
  3. Each marimo cell (a cell in that column) also becomes a table with one column and 3 rows:
    1. implied output (like a plot)
    2. code
    3. print output (just raw text)
  4. Implied output is complicated because it can be very rich:
    • If it's a single image, base64-encode it and use <img> with src="data:...". The goal is to make a self-contained file, so images will have to become nasty base64 strings...
    • If it's a list or other structured data that can be collapsed, I'm thinking <details> might be of use.
    • If the structured output isn't massive (need to define "massive"), just put it in <pre> as raw text.
  5. Code is colored text, so put each piece of syntax into a colored <span>. Make sure to not mess up indentation, keep comments and every minute detail of the code.
    • This can become complicated because this essentially needs to translate any Python AST to HTML and not lose any line positioning information.
    • This Python code becomes just text, not inside any <textarea> and not editable.
    • Always keep a copy of the raw code (as in the original .py file) in a collapsible (via <details>) <textarea> tag. The user should be able to easily extract the original code from the exported file.
  6. Raw text output can be put in a <textarea disabled> tag. It's scrollable and thus doesn't take too much space when the output is very long.
  7. Insert some vertical space between cells for aesthetics.

The result would be a really basic HTML table with almost no interactive elements (except for <details>):


As usual, that's just a theory... I really should start hacking on some code to translate Python source to highlighted HTML - that shouldn't be too hard. If only marimo could export notebooks to well-structured JSON. In that case I could just ingest the JSON and convert it to HTML or whatever else, really.