nettybun / uvic-localstar

Local-first Starboard Notebooks via compiled Deno binaries
5 stars 0 forks source link

Accessing files within Python using native `open()` #10

Open nettybun opened 3 years ago

nettybun commented 3 years ago

This was one of the original goals of the project. Lots of research leads me to think this is very hard to implement in an efficient way that scales. It's related to the handling of async vs sync blocking code between the layers of JS, Python, and Emscripten.

Emscripten has a filesystem, exported to Python and JS as FS.

Problem

There's no way to connect filesystems in JS (and therefore also the OS) to Python. Currently, the official way of loading files into Python on Starboard is to either:

There's no FS interaction.

You can explore the Python filesystem, which is part of Emscripten, and use open() to read files, but not easily from JS.

Emscripten FS APIs

The naive solution is to use the JS APIs FS.writeFile/FS.mkdir to copy the items we need into Python's memory. This doesn't work since it needs to copy content when it's never needed, and if someone is mounting a large directory like $HOME then... no way.

There's FS.createLazyFile which sounds promising, it implements a LazyUint8Array under the hood whose bytes/length are set by synchronous XHR calls. There's a warning:

Warning: Firefox and Chrome have recently disabled synchronous binary XHRs, which means this cannot work for JavaScript in regular HTML pages (but it works within Web Workers).

Python is not running in a worker, unfortunately. It's in the main thread of the Starboard iFrame - this way it can share the JS globals via import js which is a fair tradeoff. However, blocking the iFrame main thread doesn't stop Localstar, only other notebook cells. Let's try sync XHRs.

Synchronous XHRs via createLazyFile

It's blocked by a runtime check for ENVIRONMENT_IS_WORKER: https://github.com/emscripten-core/emscripten/blob/cbc974264e0b0b3f0ce8020fb2f1861376c66545/src/library_fs.js#L1757

I've seen deprecation notices in the browser console and knew that enough websites use sync XHR that browsers are not yet removing them, so I tried faking their runtime check:

// In JS _before_ loading Pyodide at all
window.importScripts = () => {}
// Load Pyodide by creating a Python cell. Unfortunately the `initializePyodide` function isn't in a JS cell's scope
pyodide._module.FS.createLazyFile('/home', 'LAZY_TEST.txt', 'http://127.0.0.1:8000/Test', true, true)

Note that the server on 8080 needs CORS.

# In Python
file = open('/home/LAZY_TEST.txt')
print(file.readLines())

In the browser network pane the request does go through and reaches the local 127.0.0.1 server:

image

Unfortunately its the FS.createLazyFile implementation which has code that doesn't work (in FF86 at least). It throws, hard, and is non recoverable. Python breaks for the notebook and needs a full page reload. This wouldn't happen in a worker, which could be recreated...

Synchronous XHRs via local implementation?

It might be possible to write/patch a method to mimic FS.createLazyFile which does work, but browser support is very very patchy because this is something you shouldn't be doing... The issue is about blocking the thread, which is related to a large issue of Python having all async code mocked (i.e time.sleep(10) returns immediately)...

Workers

You can do sync XHR in a worker. For Python this makes a lot of sense - you want open() to block. Similar to input() and other things. It may make the most sense to try moving Python to a worker, which I'll open a separate issue about, and one for async support.

Plan

I don't have one, unfortunately. I think the official import js is good for most people right now.

nettybun commented 3 years ago

Apparently BrowserFS' XHR FS works https://github.com/iodide-project/pyodide/issues/613#issuecomment-584871598

BrowserFS.configure({
        fs: "XmlHttpRequest",
        options: {index: "python/index.json"}
      }, ...

Haven't tested.

stefnotch commented 3 years ago

I'm currently also going down the rabbit hole of trying to figure out how to get a custom filesystem to work with Starboard/Pyodide/Emscripten

So, Starboard can now run Pyodide in a web worker. Which means that the createLazyFile might work somewhat better now.