Closed stepantubanov closed 1 year ago
This is the correct place to file this issue, thanks!
In this case, LLVM is lowering to wasm poorly. I'm working on a change that should fix all()
/any()
first, then I'll investigate to_bitmask
.
This is fixed in https://github.com/llvm/llvm-project/commit/8392bf6000ad039bd0e55383d40a05ddf7b4af13, not sure how long until this makes it to nightly
Resolved via https://github.com/llvm/llvm-project/commit/8392bf6000ad039bd0e55383d40a05ddf7b4af13 and rust-lang/rust#114048
@calebzulawski Awesome, thank you! 🚀
Hi,
Not sure if this belongs in here or in compiler repo. Example code (https://godbolt.org/z/4fqcdGqq6):
Ideally it should've been compiled using
i8x16.bitmask
andi8x16.all_true
instructions.https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md#all-lanes-true
Instead, it generated some really slow code extracting individual lanes. Full wasm output:
to_bitmask wasm
``` example::to_bitmask: global.get __stack_pointer i32.const 16 i32.sub drop local.get 0 v128.load 0:p2align=0 v128.const 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 i8x16.eq local.tee 1 i8x16.extract_lane_u 0 i32.const 1 i32.and local.get 1 i8x16.extract_lane_u 1 i32.const 1 i32.and i32.const 1 i32.shl i32.or local.get 1 i8x16.extract_lane_u 2 i32.const 1 i32.and i32.const 2 i32.shl i32.or local.get 1 i8x16.extract_lane_u 3 i32.const 1 i32.and i32.const 3 i32.shl i32.or local.get 1 i8x16.extract_lane_u 4 i32.const 1 i32.and i32.const 4 i32.shl i32.or local.get 1 i8x16.extract_lane_u 5 i32.const 1 i32.and i32.const 5 i32.shl i32.or local.get 1 i8x16.extract_lane_u 6 i32.const 1 i32.and i32.const 6 i32.shl i32.or local.get 1 i8x16.extract_lane_u 7 i32.const 1 i32.and i32.const 7 i32.shl i32.or local.get 1 i8x16.extract_lane_u 8 i32.const 1 i32.and i32.const 8 i32.shl i32.or local.get 1 i8x16.extract_lane_u 9 i32.const 1 i32.and i32.const 9 i32.shl i32.or local.get 1 i8x16.extract_lane_u 10 i32.const 1 i32.and i32.const 10 i32.shl i32.or local.get 1 i8x16.extract_lane_u 11 i32.const 1 i32.and i32.const 11 i32.shl i32.or local.get 1 i8x16.extract_lane_u 12 i32.const 1 i32.and i32.const 12 i32.shl i32.or local.get 1 i8x16.extract_lane_u 13 i32.const 1 i32.and i32.const 13 i32.shl i32.or local.get 1 i8x16.extract_lane_u 14 i32.const 1 i32.and i32.const 14 i32.shl i32.or local.get 1 i8x16.extract_lane_u 15 i32.const 15 i32.shl i32.or i32.const 65535 i32.and end_function ```all_zeros wasm
``` example::all_zeros: global.get __stack_pointer i32.const 16 i32.sub drop local.get 0 v128.load 0:p2align=0 v128.const 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 i8x16.ne local.tee 1 i8x16.extract_lane_u 0 i32.const 1 i32.and local.get 1 i8x16.extract_lane_u 1 i32.const 1 i32.and i32.const 1 i32.shl i32.or local.get 1 i8x16.extract_lane_u 2 i32.const 1 i32.and i32.const 2 i32.shl i32.or local.get 1 i8x16.extract_lane_u 3 i32.const 1 i32.and i32.const 3 i32.shl i32.or local.get 1 i8x16.extract_lane_u 4 i32.const 1 i32.and i32.const 4 i32.shl i32.or local.get 1 i8x16.extract_lane_u 5 i32.const 1 i32.and i32.const 5 i32.shl i32.or local.get 1 i8x16.extract_lane_u 6 i32.const 1 i32.and i32.const 6 i32.shl i32.or local.get 1 i8x16.extract_lane_u 7 i32.const 1 i32.and i32.const 7 i32.shl i32.or local.get 1 i8x16.extract_lane_u 8 i32.const 1 i32.and i32.const 8 i32.shl i32.or local.get 1 i8x16.extract_lane_u 9 i32.const 1 i32.and i32.const 9 i32.shl i32.or local.get 1 i8x16.extract_lane_u 10 i32.const 1 i32.and i32.const 10 i32.shl i32.or local.get 1 i8x16.extract_lane_u 11 i32.const 1 i32.and i32.const 11 i32.shl i32.or local.get 1 i8x16.extract_lane_u 12 i32.const 1 i32.and i32.const 12 i32.shl i32.or local.get 1 i8x16.extract_lane_u 13 i32.const 1 i32.and i32.const 13 i32.shl i32.or local.get 1 i8x16.extract_lane_u 14 i32.const 1 i32.and i32.const 14 i32.shl i32.or local.get 1 i8x16.extract_lane_u 15 i32.const 15 i32.shl i32.or i32.const 65535 i32.and i32.eqz end_function ```Meta
rustc --version --verbose
: