Pre-init Ormolu Live with Wizer

amesgen commented 1 year ago

This uses Wizer to pre-initialize Ormolu Live WASM (see the section "Using wizer to pre-initialize a WASI reactor module" in the ghc-wasm-meta README), including the parsing of the fixity DB.

This indeed improves the time it takes to format code with unusual operators for the first time, i.e. if you enter 1 +++++++ both in the current Ormolu Live and the one in this PR (see https://github.com/tweag/ormolu/pull/991#issuecomment-1421285974) and then type another 1 to make this a valid expression, the version from this PR is instant, while the one from master has a noticeable delay (but is still pretty fast).

OTOH, this slightly increases binary sizes (all sizes are with wasm-opt -Oz and, in parentheses, Netlify brotli compression (which is not --best)):

master: 18.32 MB (3.35 MB)
This PR: 22.11 MB (4.14 MB)

For fairness, one has to consider that the fixity DB is not included in the master version, which weighs 1.29 MB (0.22 MB) in the binary format. So we don't get an improvement, but also no significant regression.

github-actions[bot] commented 1 year ago

🚀 Deployed on https://2dc84f713d7f0f378b7bf718d860888c273569df--ormolu-live.netlify.app

TerrorJack commented 1 year ago

The GHC ticket's MR already produced the wasm32-wasi-ghc bindist at https://gitlab.haskell.org/ghc/ghc/-/jobs/1358898/artifacts/raw/ghc-x86_64-linux-alpine3_12-cross_wasm32-wasi-release+fully_static.tar.xz. So you may want to give it a try and add a zeroFreeList() call after hs_perform_gc(). I expect the wasm-opt shrinked size will be much improved, and if that's not the case, that'll be a valuable input to my MR to further look for places to zero out.

amesgen commented 1 year ago

Results with that bindist (with brotli -5, which seems to roughly correspond to Netlify brotli):

Without -H64m: 29 M (6.1 M)
With -H64m: 62 M (14 M)

So the first case improves noticeably, but the second one only by a much smaller margin.

TerrorJack commented 1 year ago

Thanks. My guess is that -H64m without --nonmoving-gc will not have the same bloat; in any case, the MR needs extra work to zero out the unused parts in the nonmoving allocator as well.

amesgen commented 1 year ago

Stats without --nonmoving-gc:

Without -H64m: 26 M (5.8 M)
With -H64m: 62 M (15 M)

TerrorJack commented 1 year ago

The GHC MR is undrafted and should contain the proper version of rts_zeroMemory now. The artifact is at https://gitlab.haskell.org/ghc/ghc/-/jobs/1361884/artifacts/file/ghc-x86_64-linux-alpine3_12-cross_wasm32-wasi-release+fully_static.tar.xz.

amesgen commented 1 year ago

New results:

With --nonmoving-gc:
- Without -H64m: 25.16 M (5.03 M)
- With -H64m: 22.89 M (4.29 M)
Without --nonmoving-gc:
- Without -H64m: 22.98 M (4.70 M)
- With -H64m: 22.31 M (4.72 M)

:tada:, surprising that -H64m actually makes the binary smaller in the first case.

TerrorJack commented 1 year ago

There's a bit of bad news in the upstream GHC patch: I've marked the MR as draft for now, since I managed to construct a test case that still results in size bloat when nonmoving GC is enabled. So it'll take more time to get there. Will ping again when the upstream patch is finished and lands.

TerrorJack commented 1 year ago

Good news: the GHC patch is in its final form, works just fine for nonmoving GC and awaiting review from other GHC core devs.

TerrorJack commented 1 year ago

The GHC patch has been merged, and the bindists landed in the ghc-wasm-meta update!

The function is now named rts_clearMemory()
When using it with nonmoving GC, it's necessary to call hs_perform_gc() twice, as explained in the latest tutorial
It's strongly recommended to fully evaluate packageToOps, packageToPopularity in initFixityDB, then zero & free the buffer
Due to the cabal update, the cabal list-bin command will now provide the correct executable with .wasm extension, no need to add the extension manually in the build script

amesgen commented 1 year ago

Thanks, updated! The binary sizes seem to be identical to https://github.com/tweag/ormolu/pull/991#issuecomment-1423306991

It's strongly recommended to fully evaluate packageToOps, packageToPopularity in initFixityDB, then zero & free the buffer

Is this about preventing the buffer from being retained? I slightly changed/streamlined how initFixityDB works (no "pointer+len in env var" ugliness necessary anymore); maybe that is already enough. I played around with using deepseq (diff), which reduced the binary size by <100KB, which is way less than the buffer size.

TerrorJack commented 1 year ago

Is this about preventing the buffer from being retained?

That's true for the original implementation. The current one does not retain buffer anymore so that won't be an issue. That being said, it's still recommended to do deepseq, the more work is done at pre-init time, the faster it shall be at runtime :)

amesgen commented 1 year ago

That being said, it's still recommended to do deepseq, the more work is done at pre-init time, the faster it shall be at runtime :)

Done :+1:

I updated the PR description and squashed the commits; this is now ready to be merged.

tweag / ormolu

Pre-init Ormolu Live with Wizer #991