llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.23k stars 11.65k forks source link

wasm: zero initialized arrays get encoded in data section #43018

Open Geertiebear opened 4 years ago

Geertiebear commented 4 years ago
Bugzilla Link 43673
Version 9.0
OS All
Attachments source that causes the bug
CC @Geertiebear,@tlively

Extended Description

Hello,

At the moment the value of global arrays get encoded into the data section of a wasm module, without it being needed. Memory is guaranteed to be zero initialized by the wasm spec. The result of this is that large zero initialized arrays still get inlcuded in the data section of the module, leading to extremely large binaries.

Attached is an example program that showcases the issue. Compiling the source code results in a binary of size 9.5MB, mostly consisting of zeroes. Mutliplying the "number" variable by 10 increases the binary size to 95MB. Clearly, adding a bunch more zeroes can lead to large binaries, so this could be seen as an amplification attack...

I would expect clang/llvm to only describe the array in the globals section, and not to paste the whole contents of the array in the data section if the array is zero.

Regards, Geert

Geertiebear commented 4 years ago

Binaryen's wasm-opt removes the zeroes in addition to its other optimizations. Thanks for the tip, I will check that out.

This is fixed... Good to hear, thank you for the fix!

tlively commented 4 years ago

This is fixed for the common case of exported memories in https://reviews.llvm.org/D68965. For imported memories with bulk memory enabled, we could still do slightly better by emitting memory.fill instructions instead of memory.init instructions to initialize .bss segments, so I will leave this bug open until that is implemented as well.

tlively commented 4 years ago

Thanks for the report! I agree that this is something we should definitely fix in lld, and I'll take a look at fixing it, but in practice this hasn't been a big problem because Binaryen's wasm-opt removes the zeroes in addition to its other optimizations. You should definitely be running wasm-opt on your binaries.

ParkHanbum commented 3 months ago

I think this issue can be closed since this commit merged https://github.com/llvm/llvm-project/commit/1eb79e732c47386258e04c4b59a78047c422c0f4