unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.32k stars 172 forks source link

Assert correct generation of `codepointtrie_builder/ucptrie_wrap.wasm` in CI #5194

Closed robertbastian closed 1 month ago

robertbastian commented 1 month ago

This is a potential backdoor, and currently we need to trust the contributor who contributes changes to the WASM code (or the WASM sandbox).

Tracking issue for:

sffc commented 1 month ago

Asserting this in CI is going to be brittle because it needs to be built on the exact same compiler version. If you go and rebuild the wasm file today, it probably won't be the same as the one I checked in a few months ago.

Even if we have the CI in place, is it going to satisfy the security bot?

It would be nice if the bot gave a break to WASM binaries since our approach to security is basically saying "WASM is an intermediate artifact and it is executed in a sandbox".

Should we check in the WebAssembly Text (WAT) file instead of the WASM file?

robertbastian commented 1 month ago

The security bot will have to be manually dismissed. I would like to dismiss it with a comment saying that this is the correct compilation though.

However, regardless of the bot, this is a security issue. Every user has to trust the person who submitted the wasm file. That person doesn't even have to be malicious, their system could be compromised.

If we say we're happy to be only as safe as wasmer's sandbox, we can just silence the alert.

sffc commented 1 month ago

The code is in

https://github.com/unicode-org/icu4x/blob/main/components/collections/codepointtrie_builder/src/wasm.rs

We use

https://docs.rs/wasmi/latest/wasmi/struct.Module.html#method.new

which is documented as

Creates a new Wasm Module from the given Wasm bytecode buffer.

Note This parses, validates and translates the buffered Wasm bytecode.

Errors If the Wasm bytecode is malformed or fails to validate. If the Wasm bytecode violates restrictions set in the Config used by the engine. If Wasmi cannot translate the Wasm bytecode.

There is also an "unchecked" version of that function, which we do not use.

After the WASM itself is validated, all of the code lives in wasm.rs. We have many .unwrap()s in that file that check for every invariant. We even use CodePointTrie::try_new to interpret the results of the WASM builder.

So, I think the scope of the vulnerability is creating valid but incorrect CodePointTrie data. A malicious actor would be able to move a character from being general category "symbol" to general category "letter", for example.

robertbastian commented 1 month ago

There's a security audit for 0.31 as well: https://github.com/wasmi-labs/wasmi/blob/main/resources/security-audit-2023-12-20.pdf

sffc commented 1 month ago

Discussion:

Action: make a PR changing the WASM file to a WAT file.

robertbastian commented 1 month ago

The security warning triggered for components/icu/tests/data/tutorial_buffer.wasm after this fix, and that cannot be replaced by WAT as we need a binary WASM file to scan. I've dismissed the warning as test-only.