pyodide / pyodide

Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
https://pyodide.org/en/stable/
Mozilla Public License 2.0
11.71k stars 789 forks source link

Optimize size and load time #1365

Open ghost opened 3 years ago

ghost commented 3 years ago

Python in the browser is no threat to Javascript so long as it takes 5 seconds on the first load! That it takes about 2 seconds in the second load doesn't help that much. True, the browser has an advantage here, however, there are steps that can be taken to make Pyodide more useful.

rth commented 3 years ago

Thanks for the feedback @brianmingus2 ! Yes, we are trying to slim down the minimal pyodide build to make it load faster (https://github.com/iodide-project/pyodide/issues/646) and there is clearly more work that can be done there . @phorward also did some work on this recently as far as I remember.

Instead of eschewing jQuery, embrace jQuery: in this minimal Python environment, facilitate getting the job done, in Python. Use Brython's jQuery support as an example of how to do jQuery in Python correctly.

I'm not too familiar with Brython, and am not sure what's the motivation for using jQuery more. Could you elaborate?

Start loading dependencies async, as soon as content above the fold has been rendered.

I think we are already loading packages async and concurently. What content? Pyodide only exposes a a few JS functions, we can't control when they are called on the page.

Provide support for allowing LLVM to do Dead-Code Elimination based on Profile-Guided Optimization, based on profiling user's code run in production. This will produce tiny WASM binaries that are fit for purpose.

Yes, there are certainly more compilation options to explore. At least to enable LTO, though it increases the compilation time. For dead-code elimination based on profiling user's code, users can do it, we can't on our side.

ghost commented 3 years ago

You can think of jQuery as a new API for the DOM, which has an esoteric design. You should be able to simultaneously download it along with your minimal python environment, then provide access to the jQuery object. jQuery makes extensive use of chaining FYI.

rth commented 3 years ago

You can think of jQuery as a new API for the DOM, which has an esoteric design. You should be able to simultaneously download it along with your minimal python environment, then provide access to the jQuery object.

You can use JQuery from Python if you load it alongside pyodide. For instance it is already loaded in the REPL console, where one can do,

>>> from js import jQuery
>>> res = jQuery('.terminal-output span:first()').text()
>>> print(res)
Welcome to the Pyodide terminal emulator 🐍

Please report if something is not working as expected. So that's more or less equivalent to the Brython jQuery example (well at least the last one), unless I'm missing something.

ghost commented 3 years ago

Thanks for pointing this out. A minimal python environment that supports not much more than jquery and string manipulation and which is fast enough to render content above the fold in less than 1.5 seconds is my recommendation.

rth commented 3 years ago

Absolutely, a <1.5s load target is a good objective, and more work is needed to get there.

ghost commented 3 years ago

How far does delaying downloading of as many assets as possible get us? How about the entire python standard library - no imports available except Javascript. This code needs to be factored out of the main WASM blob as JIT downloaded imports?

hoodmane commented 3 years ago

Well the Python standard library in v3.8.2 has 692 files (excluding tests), which adds up to around 11 megabytes of Python source code. For instance sqlite3 has around 300kb of python source code and is probably not super important.

Looking at sys.modules in a fresh copy of the native Python interpreter, there is about 2.6 megabytes of code from the standard library loaded. So that leaves the remaining 8 megabytes which isn't needed in order to get a basic interpreter loaded. Loading asyncio and typing and things like that that are likely to be used a alot gets us up to 3.4mb.

Of course I have no clue what the compressed size of this code is.

ghost commented 3 years ago

If you run lrztar on that code you can estimate its deployment size.

hoodmane commented 3 years ago

Full standard library is 6.9 megabytes after lrztar, just the files loaded at startup is 680 kilobytes. But the download size for pyodide is less than 6.9 megabytes so I'm not sure whether I'm measuring the right thing.

ghost commented 3 years ago

I think lrztar will over-estimate the compression achieved by the compiler. A 500KB download size might let us hit a target of 1.5 seconds, for a website that uses Pyodide to draw the page.

hoodmane commented 3 years ago

Well I'm also not counting the main Python binary just the packages, so with that added in the download size would likely be bigger than 500kb in any case.

ghost commented 3 years ago

I wonder if it would be beneficial to chop up these files into pieces and re-assemble in the client. The browser uses a queue for downloads from the same domain, which serializes at some point, but seems to be fine with massive parallelism. Consider this website I built with Brython: http://cloudmoji.com

ghost commented 3 years ago

Actually, is the browser's JIT already running the code as it downloads, or do we have to coax it to do that?

hoodmane commented 3 years ago

I think the streaming wasm compiler runs as the wasm is downloaded.

ghost commented 3 years ago

What if we compile every file as an extension module with Nuitka and make a dependency graph analysis that allows us to download independent WASM parts in parallel, so the browser can load Python in parallel?

rth commented 3 years ago

I wonder if it would be beneficial to chop up these files into pieces and re-assemble in the client. The browser uses a queue for downloads from the same domain, which serializes at some point, but seems to be fine with massive parallelism.

Streaming WASM compilation is already parallelized and if assets are cached in the browser init time is the same so it's not about load times. I suspect it could be also about WASM file system operations for data/.py files https://github.com/iodide-project/pyodide/issues/347#issuecomment-758212302

make a dependency graph analysis that allows us to download independent WASM parts

We may have go in that direction eventually, but this would add a lot of complexity to the project and increase its maintenance cost. So it might be preferable to explore easier approaches first.

rth commented 3 years ago

FWIW compiling CPython with -Os (instead of -O3) reduces the pyodide.asm.wasm from 10MB to 9MB (which will be compressed down to 2.8MB when deployed, so it's not that significant), however it has no measurable impact on page load which seems to indicate that wasm compilation may not be the bottleneck (or that it doesn't scale linearly with size). We need to profile page load.

https://github.com/emscripten-core/emscripten/issues/10603 suggests that LTO doesn't help much with the updstream compiler. emscripten has also a good deployment guide in https://emscripten.org/docs/compiling/Deploying-Pages.html which should through that list of optimizations.

ghost commented 3 years ago

What if we target the jQuery Test Suite for an interim python environment, which will allow us to help emscripten do better dead-code elimination for that environment. Meanwhile, Pyodide fires up, and we copy our objects over to it when it's ready.

ghost commented 3 years ago

I was not able to find a test suite for a subset of python. In order to define one, we could select a subset of existing peformance benchmarks that only use certain parts of the language, then let emscripten slough off the rest.

It should be safe to copy over the namespace. All in theory.