walrus roundtrip inflates size of WebAssembly binaries

RReverser commented 4 years ago

Describe the Bug

Parsing and re-emitting WebAssembly even without modifications often produces larger binary than the original.

Steps to Reproduce

Download https://squoosh.app/optipng.4e77b.wasm (this is just a public example, but the problem can be reproduced on pretty much arbitrary real-world Wasm files).

Use the following code:

let mut module = walrus::ModuleConfig::new()
    .generate_producers_section(false)
    .parse_file("optipng.4e77b.wasm")?;
module.producers.clear();
module.emit_wasm_file("optipng.out.wasm")?;

Check input and output sizes.

Expected Behavior

Output is same (or maybe even smaller) than the original.

Actual Behavior

Input file is 286727 bytes long, and output is 290568, making it 3.75KB larger.

Additional Context

wasm-objdump -h dump for input:

     Type start=0x0000000b end=0x00000101 (size=0x000000f6) count: 33
   Import start=0x00000104 end=0x000006d0 (size=0x000005cc) count: 68
 Function start=0x000006d3 end=0x00000976 (size=0x000002a3) count: 673
   Global start=0x00000978 end=0x0000099c (size=0x00000024) count: 7
   Export start=0x0000099f end=0x00000b3d (size=0x0000019e) count: 26
     Elem start=0x00000b40 end=0x00000cce (size=0x0000018e) count: 1
     Code start=0x00000cd2 end=0x0003c473 (size=0x0003b7a1) count: 673

and output:

     Type start=0x0000000e end=0x00000104 (size=0x000000f6) count: 33
   Import start=0x0000010a end=0x000006d6 (size=0x000005cc) count: 68
 Function start=0x000006dc end=0x0000097f (size=0x000002a3) count: 673
   Global start=0x00000985 end=0x000009a9 (size=0x00000024) count: 7
     Elem start=0x00000b53 end=0x00000d1d (size=0x000001ca) count: 1
     Code start=0x00000d23 end=0x0003d372 (size=0x0003c64f) count: 673
     Data start=0x0003d378 end=0x00046f08 (size=0x00009b90) count: 54

As you can see, most sections remained unchanged, but Elem and Code became larger.

Compression sizes differ as well (even if less so) - e.g. with Brotli input is 103410 bytes and output is 103887 bytes.

alexcrichton commented 4 years ago

A curious bug!

I don't think your input/output are for the same file, using the input file in the link you gisted above I get:

input

optipng.4e77b.wasm:     file format wasm 0x1                           

Sections:                                                              

     Type start=0x0000000b end=0x00000101 (size=0x000000f6) count: 33  
   Import start=0x00000104 end=0x000006d0 (size=0x000005cc) count: 68  
 Function start=0x000006d3 end=0x00000976 (size=0x000002a3) count: 673 
   Global start=0x00000978 end=0x0000099c (size=0x00000024) count: 7   
   Export start=0x0000099f end=0x00000b3d (size=0x0000019e) count: 26  
     Elem start=0x00000b40 end=0x00000cce (size=0x0000018e) count: 1   
     Code start=0x00000cd2 end=0x0003c473 (size=0x0003b7a1) count: 673 
     Data start=0x0003c477 end=0x00046007 (size=0x00009b90) count: 54

output

optipng.out.wasm:       file format wasm 0x1

Sections:

     Type start=0x0000000e end=0x00000104 (size=0x000000f6) count: 33
   Import start=0x0000010a end=0x000006d6 (size=0x000005cc) count: 68
 Function start=0x000006dc end=0x0000097f (size=0x000002a3) count: 673
   Global start=0x00000985 end=0x000009a9 (size=0x00000024) count: 7
   Export start=0x000009af end=0x00000b4d (size=0x0000019e) count: 26
     Elem start=0x00000b53 end=0x00000d1d (size=0x000001ca) count: 1
     Code start=0x00000d23 end=0x0003d372 (size=0x0003c64f) count: 673
     Data start=0x0003d378 end=0x00046f08 (size=0x00009b90) count: 54

The differences here are small differences in the elem and code sections. The elem section seems entirely accounted for because walrus will renumber function indices. The code section I presume is similar. Walrus sorts functions biggest first to get engines with streaming translation the chance to start on the big functions first, and it looks like that happens to cause the size to increase here, presumably because small functions are referenced far more often than large functions.

Walrus isn't really a size-optimizing compiler. It's possible that we could add an entirely new emission mode to optimize for this though.

RReverser commented 4 years ago

I don't think your input/output are for the same file, using the input file in the link you gisted above I get:

Sorry, I'm not sure what you meant here. Your output seems to match mine at the first glance? (except I accidentally omitted last line in wasm-objdump for input when copy-pasting).

and it looks like that happens to cause the size to increase here, presumably because small functions are referenced far more often than large functions

Yeah, could be, although surprised it makes such a big difference.

I think size is often important for Wasm users, so adding such mode would be benefitial, but if not, maybe it's worth documenting as explicit non-goal in docs or README?

RReverser commented 4 years ago

I also wonder if it would be possible to have this reordering as an optional pass, so that by default functions would appear in original order and roundtrip would be preserved more closely. This would basically delegate order of most functions to the original Wasm compiler, which might choose to optimise for speed or perf as well.

fitzgen commented 4 years ago

That would require keeping track of the original order, which we don't currently do.

RReverser commented 4 years ago

@fitzgen As far as I understand from the code, functions should be already naturally added in original order to the arena during parsing, and that order is preserved throughout most basic transformations.

The actual reordering seems to be happening just here - https://github.com/rustwasm/walrus/blob/35bbfb0c450263046c8481a04c967243351d77de/src/module/functions/mod.rs#L444 - during the emit phase.

alexcrichton commented 4 years ago

Yes the difference is that the input binary you listed above doesn't look like it has a data section, but the data section is there (and is the same size in both)

I don't really think this warrants any notes in the README or anything like that, it's not like we're not size-averse or we specifically don't optimize for size in walrus.

This is a micro-optimization in one use case where resorting functions causes bloat. If you want to 100% optimize for size that's not really what walrus is built for, and that's why there's tools like wasm-opt -Os which probably account for this.

RReverser commented 4 years ago

doesn't look like it has a data section, but the data section is there (and is the same size in both)

Oh yeah, sorry, as I said, it was just a copy-paste mistake.

I don't really think this warrants any notes in the README or anything like that, it's not like we're not size-averse or we specifically don't optimize for size in walrus.

To me, it's not so much about intentionally optimising for the size, as about roundtrip producing different results. Other tools either usually preserve it by default, or, if they also use a different internal IR - such as Binaryen - clearly document that roundtrip can produce very different Wasm than the input.

I think in this case, as a minimum, it would make sense to follow what Binaryen does with just documenting this constraint. On the other hand, the fix to keep original order (and keep reordering as a separate option) seems so simple that it seems worth just implementing it instead.

RReverser commented 4 years ago

I experimented with this a bit, and, while disabling function reordering decreases size a bit closer to original, it seems that the main overhead remains and is caused by locals reordering instead.

alexcrichton commented 4 years ago

I think it's reasonable to document yeah that walrus will not exactly roundtrip a file when it's read and then serialized, that's not a design goal of this crate.

rustwasm / walrus