ncruces / go-sqlite3

Go bindings to SQLite using wazero
https://pkg.go.dev/github.com/ncruces/go-sqlite3
MIT License
402 stars 12 forks source link

[suggestion] more optimization passes through wasm-opt #108

Closed NyaaaWhatsUpDoc closed 2 months ago

NyaaaWhatsUpDoc commented 2 months ago

tweaking the the number of optimization passes you do through wasm-opt can net you a huge 53KB size decrease and some massive (they're negligible, but still numerically noticeable) performance improvements when looking at https://github.com/ncruces/go-sqlite-bench .

i'm playing around with compiling ffmpeg's shared libaries to wasm at the moment and thought i'd try applying my own selection of wasm-opt flags to go-sqlite3 and there was a small difference. if you want to take a look yourself there's a branch here: https://github.com/NyaaaWhatsUpDoc/go-sqlite3/tree/performance/wasm-opt-optimization-passes

i haven't PR'd it as i wasn't sure of what you'd think of these tiny gains in the face of increased build time :smile:

ncruces commented 2 months ago

Hi! Thanks for looking into this. Build time doesn't really bother me (unless the build action that reproduces the build and signs the binaries starts to timeout).

I don't think go-sqlite-bench is consistent enough to measure this? speedtest1 is probably better (the memory version, in particular).

About your size gains, a lot of it might be down to -g, but I've found that keeping debug names (not line numbers) is useful when there is a crash (reasonable stack traces are helpful).

I used Oz before and switched to O3 because performance was better with O3, but I'll test again, with your flags too.

NyaaaWhatsUpDoc commented 2 months ago

one thing that's useful regarding this is: https://github.com/WebAssembly/binaryen/wiki/Optimizer-Cookbook

where they specifically mention passing -Ozmultiple times after a flatten + rereloop, as i think each argument passed is itself an optimization pass, so it is possible (and sometimes beneficial) to pass arguments multiple times

ncruces commented 2 months ago

I tried these on speedtest1, and I didn't get impressive improvements.

I can reproduce space savings, of around 3%, all due to stripping debug info, but not speed improvements.

I'm running:

go test -bench=speed -benchtime=100x -count 6 -- --memdb

And results are like:

$ benchstat before after 
               │   before    │            after             │
               │   sec/op    │   sec/op     vs base         │
_speedtest1-12   65.11m ± 1%   65.38m ± 2%  ~ (p=0.180 n=6)

Also, it seems flatten and rereloop, besides taking a long time, do require removing debug info.

Tbh, I'm inclined to not to anything here?