llogiq / bytecount

Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Apache License 2.0
214 stars 25 forks source link

Using simd128 leads to unloadable Wasm on older browsers #94

Open alecmocatta opened 1 month ago

alecmocatta commented 1 month ago

I believe I'm seeing the same as this issue with this crate:

Here's wasm2wat showing the functions at issue:

  (func $bytecount::simd::wasm::chunk_count::h678df4b5a0bba3e6 (type 13) (param i32 i32 i32) (result i32)
    (local i32 i32 i32 i32 i32 i32 v128 v128 v128 v128 v128 v128 v128 v128 v128)
    global.get 0
    i32.const 96
    i32.sub
    local.tee 3
    global.set 0
    local.get 2
    i8x16.splat
    local.set 9
    i32.const 4080
    local.set 4
    i32.const 0
    local.tee 2
    local.set 5
    local.get 2
    local.set 2
    block  ;; label = @1
      block  ;; label = @2
        block  ;; label = @3
          block  ;; label = @4
            block  ;; label = @5
              block  ;; label = @6
                block  ;; label = @7
                  block  ;; label = @8
                    loop  ;; label = @9
                      local.get 2
                      local.set 2
                      local.get 5
                      local.set 6
                      local.get 4
                      local.get 1
                      i32.gt_u
                      br_if 1 (;@8;)
                      local.get 2
                      local.set 2
                      i32.const 1
                      local.set 5
                      v128.const i32x4 0x00000000 0x00000000 0x00000000 0x00000000
                      local.tee 10
                      local.set 11
                      local.get 10
                      local.set 12
                      local.get 10
                      local.set 13
                      local.get 10
                      local.set 10
                      loop  ;; label = @10
                        local.get 10
                        local.set 10
                        local.get 13
                        local.set 13
                        local.get 12
                        local.set 12
                        local.get 11
                        local.set 11
                        local.get 5
                        local.set 4
                        local.get 3
                        local.get 0
                        local.get 1
                        local.get 2
                        local.tee 2
                        call $bytecount::simd::wasm::u8x16x4_from_offset::h99092da5ce490a0f
                        local.get 2
                        i32.const 64
                        i32.add
                        local.tee 7
                        local.get 2
                        i32.lt_u
                        br_if 4 (;@6;)
                        local.get 7
                        local.set 2
                        local.get 4
                        i32.const 1
                        i32.add
                        local.set 5
                        local.get 11
                        local.get 3
                        v128.load
                        local.get 9
                        i8x16.eq
                        i8x16.sub
                        local.tee 14
                        local.set 11
                        local.get 12
                        local.get 3
                        v128.load offset=16
                        local.get 9
                        i8x16.eq
                        i8x16.sub
                        local.tee 15
                        local.set 12
                        local.get 13
                        local.get 3
                        v128.load offset=32
                        local.get 9
                        i8x16.eq
                        i8x16.sub
                        local.tee 16
                        local.set 13
                        local.get 10
                        local.get 3
                        v128.load offset=48
                        local.get 9
                        i8x16.eq
                        i8x16.sub
                        local.tee 17
                        local.set 10
                        local.get 4
                        i32.const 255
                        i32.lt_u
                        br_if 0 (;@10;)
                      end
                      local.get 16
                      i16x8.extadd_pairwise_i8x16_u
                      local.get 17
                      i16x8.extadd_pairwise_i8x16_u
                      i16x8.add
                      local.get 15
                      i16x8.extadd_pairwise_i8x16_u
                      i16x8.add
                      local.get 14
                      i16x8.extadd_pairwise_i8x16_u
                      i16x8.add
                      i32x4.extadd_pairwise_i16x8_u
                      local.tee 11
                      i32x4.extract_lane 0
                      local.tee 4
                      local.get 11
                      i32x4.extract_lane 1
                      i32.add
                      local.tee 2
                      local.get 4
                      i32.lt_u
                      br_if 5 (;@4;)
                      local.get 11
                      i32x4.extract_lane 2
                      local.tee 4
                      local.get 11
                      i32x4.extract_lane 3
                      i32.add
                      local.tee 5
                      local.get 4
                      i32.lt_u
                      br_if 6 (;@3;)
                      local.get 2
                      local.get 5
                      i32.add
                      local.tee 4
                      local.get 2
                      i32.lt_u
                      br_if 7 (;@2;)
                      local.get 6
                      local.get 4
                      i32.add
                      local.tee 2
                      local.get 6
                      i32.lt_u
                      br_if 2 (;@7;)
                      local.get 7
                      i32.const 4080
                      i32.add
                      local.tee 6
                      local.set 4
                      local.get 2
                      local.set 5
                      local.get 7
                      local.set 2
                      local.get 6
                      local.get 7
                      i32.ge_u
                      br_if 0 (;@9;)
                    end
                    i32.const 4119884
                    call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
                    unreachable
                  end
                  local.get 1
                  local.get 2
                  i32.lt_u
                  br_if 2 (;@5;)
                  block  ;; label = @8
                    local.get 1
                    local.get 2
                    i32.sub
                    local.tee 4
                    i32.const 64
                    i32.ge_u
                    br_if 0 (;@8;)
                    v128.const i32x4 0x00000000 0x00000000 0x00000000 0x00000000
                    local.tee 11
                    local.set 12
                    local.get 11
                    local.set 13
                    local.get 11
                    local.set 10
                    local.get 11
                    local.set 11
                    local.get 2
                    local.set 2
                    br 7 (;@1;)
                  end
                  local.get 4
                  i32.const 6
                  i32.shr_u
                  local.set 8
                  local.get 2
                  local.set 2
                  local.get 4
                  i32.const 63
                  i32.gt_u
                  local.set 7
                  v128.const i32x4 0x00000000 0x00000000 0x00000000 0x00000000
                  local.tee 10
                  local.set 11
                  local.get 10
                  local.set 12
                  local.get 10
                  local.set 13
                  local.get 10
                  local.set 10
                  block  ;; label = @8
                    loop  ;; label = @9
                      local.get 10
                      local.set 10
                      local.get 13
                      local.set 13
                      local.get 12
                      local.set 12
                      local.get 11
                      local.set 11
                      local.get 7
                      local.set 4
                      local.get 3
                      local.get 0
                      local.get 1
                      local.get 2
                      local.tee 2
                      call $bytecount::simd::wasm::u8x16x4_from_offset::h99092da5ce490a0f
                      local.get 2
                      i32.const 64
                      i32.add
                      local.tee 5
                      local.get 2
                      i32.lt_u
                      br_if 1 (;@8;)
                      local.get 5
                      local.set 2
                      local.get 4
                      i32.const 1
                      i32.add
                      local.set 7
                      local.get 11
                      local.get 3
                      v128.load offset=48
                      local.get 9
                      i8x16.eq
                      i8x16.sub
                      local.tee 14
                      local.set 11
                      local.get 12
                      local.get 3
                      v128.load offset=32
                      local.get 9
                      i8x16.eq
                      i8x16.sub
                      local.tee 15
                      local.set 12
                      local.get 13
                      local.get 3
                      v128.load
                      local.get 9
                      i8x16.eq
                      i8x16.sub
                      local.tee 16
                      local.set 13
                      local.get 10
                      local.get 3
                      v128.load offset=16
                      local.get 9
                      i8x16.eq
                      i8x16.sub
                      local.tee 17
                      local.set 10
                      local.get 4
                      local.get 8
                      i32.lt_u
                      br_if 0 (;@9;)
                    end
                    local.get 17
                    local.set 12
                    local.get 16
                    local.set 13
                    local.get 15
                    local.set 10
                    local.get 14
                    local.set 11
                    local.get 5
                    local.set 2
                    br 7 (;@1;)
                  end
                  i32.const 4120012
                  call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
                  unreachable
                end
                i32.const 4120028
                call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
                unreachable
              end
              i32.const 4120044
              call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
              unreachable
            end
            i32.const 4119900
            call $core::panicking::panic_const::panic_const_sub_overflow::ha660620485d267ca
            unreachable
          end
          i32.const 4119836
          call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
          unreachable
        end
        i32.const 4119852
        call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
        unreachable
      end
      i32.const 4119868
      call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
      unreachable
    end
    local.get 2
    local.set 7
    block  ;; label = @1
      block  ;; label = @2
        block  ;; label = @3
          block  ;; label = @4
            block  ;; label = @5
              block  ;; label = @6
                local.get 13
                i16x8.extadd_pairwise_i8x16_u
                local.get 12
                i16x8.extadd_pairwise_i8x16_u
                i16x8.add
                local.get 10
                i16x8.extadd_pairwise_i8x16_u
                i16x8.add
                local.get 11
                i16x8.extadd_pairwise_i8x16_u
                i16x8.add
                i32x4.extadd_pairwise_i16x8_u
                local.tee 11
                i32x4.extract_lane 0
                local.tee 4
                local.get 11
                i32x4.extract_lane 1
                i32.add
                local.tee 2
                local.get 4
                i32.lt_u
                br_if 0 (;@6;)
                local.get 11
                i32x4.extract_lane 2
                local.tee 4
                local.get 11
                i32x4.extract_lane 3
                i32.add
                local.tee 5
                local.get 4
                i32.lt_u
                br_if 1 (;@5;)
                local.get 2
                local.get 5
                i32.add
                local.tee 4
                local.get 2
                i32.lt_u
                br_if 2 (;@4;)
                block  ;; label = @7
                  block  ;; label = @8
                    block  ;; label = @9
                      local.get 6
                      local.get 4
                      i32.add
                      local.tee 8
                      local.get 6
                      i32.lt_u
                      br_if 0 (;@9;)
                      local.get 1
                      local.get 7
                      i32.lt_u
                      br_if 1 (;@8;)
                      local.get 1
                      local.get 7
                      i32.sub
                      local.tee 2
                      i32.const 16
                      i32.ge_u
                      br_if 2 (;@7;)
                      v128.const i32x4 0x00000000 0x00000000 0x00000000 0x00000000
                      local.set 12
                      br 8 (;@1;)
                    end
                    i32.const 4119916
                    call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
                    unreachable
                  end
                  i32.const 4119932
                  call $core::panicking::panic_const::panic_const_sub_overflow::ha660620485d267ca
                  unreachable
                end
                local.get 2
                i32.const 4
                i32.shr_u
                local.set 6
                local.get 2
                i32.const 15
                i32.gt_u
                local.set 2
                v128.const i32x4 0x00000000 0x00000000 0x00000000 0x00000000
                local.set 11
                i32.const 0
                local.set 5
                block  ;; label = @7
                  loop  ;; label = @8
                    local.get 11
                    local.set 11
                    local.get 2
                    local.set 4
                    local.get 7
                    local.get 5
                    i32.const 4
                    i32.shl
                    i32.add
                    local.tee 2
                    local.get 7
                    i32.lt_u
                    br_if 1 (;@7;)
                    local.get 3
                    local.get 2
                    i32.store offset=72
                    local.get 2
                    i32.const 16
                    i32.add
                    local.tee 5
                    local.get 2
                    i32.lt_u
                    br_if 5 (;@3;)
                    local.get 5
                    local.get 1
                    i32.gt_u
                    br_if 6 (;@2;)
                    local.get 11
                    local.get 0
                    local.get 2
                    i32.add
                    v128.load align=1
                    local.get 9
                    i8x16.eq
                    i8x16.sub
                    local.tee 11
                    local.set 12
                    local.get 4
                    i32.const 1
                    i32.add
                    local.set 2
                    local.get 11
                    local.set 11
                    local.get 4
                    local.set 5
                    local.get 4
                    local.get 6
                    i32.ge_u
                    br_if 7 (;@1;)
                    br 0 (;@8;)
                  end
                end
                i32.const 4119996
                call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
                unreachable
              end
              i32.const 4119836
              call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
              unreachable
            end
            i32.const 4119852
            call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
            unreachable
          end
          i32.const 4119868
          call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
          unreachable
        end
        i32.const 4119620
        call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
        unreachable
      end
      local.get 3
      i32.const 88
      i32.add
      i32.const 65
      i32.store
      local.get 3
      i32.const 2
      i32.store offset=4
      local.get 3
      i32.const 4119648
      i32.store
      local.get 3
      i64.const 2
      i64.store offset=12 align=4
      local.get 3
      i32.const 65
      i32.store offset=80
      local.get 3
      local.get 1
      i32.store offset=92
      local.get 3
      local.get 3
      i32.const 76
      i32.add
      i32.store offset=8
      local.get 3
      local.get 3
      i32.const 92
      i32.add
      i32.store offset=84
      local.get 3
      local.get 3
      i32.const 72
      i32.add
      i32.store offset=76
      local.get 3
      i32.const 4119664
      call $core::panicking::panic_fmt::hfdaf3eddd0a11d4f
      unreachable
    end
    local.get 12
    local.set 11
    block  ;; label = @1
      block  ;; label = @2
        local.get 1
        i32.const 15
        i32.and
        local.tee 2
        br_if 0 (;@2;)
        local.get 11
        local.set 9
        br 1 (;@1;)
      end
      local.get 11
      local.get 2
      i32.const 4119948
      i32.add
      v128.load align=1
      v128.const i32x4 0x00000000 0x00000000 0x00000000 0x00000000
      local.get 0
      local.get 1
      i32.add
      i32.const -16
      i32.add
      v128.load align=1
      local.get 9
      i8x16.eq
      v128.bitselect
      i8x16.sub
      local.set 9
    end
    block  ;; label = @1
      block  ;; label = @2
        block  ;; label = @3
          block  ;; label = @4
            local.get 9
            i16x8.extadd_pairwise_i8x16_u
            i32x4.extadd_pairwise_i16x8_u
            local.tee 9
            i32x4.extract_lane 0
            local.tee 4
            local.get 9
            i32x4.extract_lane 1
            i32.add
            local.tee 2
            local.get 4
            i32.lt_u
            br_if 0 (;@4;)
            local.get 9
            i32x4.extract_lane 2
            local.tee 4
            local.get 9
            i32x4.extract_lane 3
            i32.add
            local.tee 7
            local.get 4
            i32.lt_u
            br_if 1 (;@3;)
            local.get 2
            local.get 7
            i32.add
            local.tee 4
            local.get 2
            i32.lt_u
            br_if 2 (;@2;)
            local.get 8
            local.get 4
            i32.add
            local.tee 2
            local.get 8
            i32.ge_u
            br_if 3 (;@1;)
            i32.const 4119980
            call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
            unreachable
          end
          i32.const 4119788
          call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
          unreachable
        end
        i32.const 4119804
        call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
        unreachable
      end
      i32.const 4119820
      call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
      unreachable
    end
    local.get 3
    i32.const 96
    i32.add
    global.set 0
    local.get 2)
  (func $bytecount::simd::wasm::u8x16x4_from_offset::h99092da5ce490a0f (type 15) (param i32 i32 i32 i32)
    (local i32 i32 i32 v128)
    global.get 0
    i32.const 48
    i32.sub
    local.tee 4
    global.set 0
    local.get 4
    local.get 3
    i32.store
    block  ;; label = @1
      block  ;; label = @2
        block  ;; label = @3
          block  ;; label = @4
            block  ;; label = @5
              local.get 3
              i32.const 64
              i32.add
              local.tee 5
              local.get 3
              i32.lt_u
              br_if 0 (;@5;)
              local.get 5
              local.get 2
              i32.gt_u
              br_if 1 (;@4;)
              local.get 3
              i32.const 16
              i32.add
              local.tee 2
              local.get 3
              i32.lt_u
              br_if 2 (;@3;)
              local.get 3
              i32.const 32
              i32.add
              local.tee 5
              local.get 3
              i32.lt_u
              br_if 3 (;@2;)
              local.get 3
              i32.const 48
              i32.add
              local.tee 6
              local.get 3
              i32.ge_u
              br_if 4 (;@1;)
              i32.const 4119772
              call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
              unreachable
            end
            i32.const 4119680
            call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
            unreachable
          end
          local.get 4
          i32.const 40
          i32.add
          i32.const 65
          i32.store
          local.get 4
          i32.const 2
          i32.store offset=8
          local.get 4
          i32.const 4119708
          i32.store offset=4
          local.get 4
          i64.const 2
          i64.store offset=16 align=4
          local.get 4
          i32.const 65
          i32.store offset=32
          local.get 4
          local.get 2
          i32.store offset=44
          local.get 4
          local.get 4
          i32.const 28
          i32.add
          i32.store offset=12
          local.get 4
          local.get 4
          i32.const 44
          i32.add
          i32.store offset=36
          local.get 4
          local.get 4
          i32.store offset=28
          local.get 4
          i32.const 4
          i32.add
          i32.const 4119724
          call $core::panicking::panic_fmt::hfdaf3eddd0a11d4f
          unreachable
        end
        i32.const 4119740
        call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
        unreachable
      end
      i32.const 4119756
      call $core::panicking::panic_const::panic_const_add_overflow::ha5f2ad64652d0d2f
      unreachable
    end
    local.get 1
    local.get 3
    i32.add
    v128.load align=1
    local.set 7
    local.get 0
    local.get 1
    local.get 2
    i32.add
    v128.load align=1
    v128.store offset=16
    local.get 0
    local.get 7
    v128.store
    local.get 0
    local.get 1
    local.get 6
    i32.add
    v128.load align=1
    v128.store offset=48
    local.get 0
    local.get 1
    local.get 5
    i32.add
    v128.load align=1
    v128.store offset=32
    local.get 4
    i32.const 48
    i32.add
    global.set 0)
llogiq commented 4 weeks ago

Does anyone know a way to implement a fallback so that we can use intrinsics and fall back to the generic version if that fails? Otherwise we might add a feature to force the generic version on wasm for such browsers and the user will either have to supply a browser check to select the best version or live with the suboptimal performance on browsers supporting SIMD.

alecmocatta commented 3 weeks ago

@llogiq A Wasm binary that includes unsupported intrinsics can fail to parse, even if it won't use them. This comment is accurate unfortunately https://github.com/BurntSushi/memchr/issues/144#issuecomment-1887216278:

The current way of doing feature detection with WASM on browsers is to try to load a small WASM with the specific feature and see whether it fails. See for example this library from Google.

The route memchr took https://github.com/BurntSushi/memchr/pull/149 is to only use intrinsics when #[cfg(target_feature = "simd128")]. This way you can force the intrinsic or generic version at compile-time with or without RUSTFLAGS=-Ctarget-feature=+simd128. Alternately:

[target.wasm32-unknown-unknown]
rustflags = ["-Ctarget-feature=+simd128"]

Apps can then build multiple binaries, and use feature detection to serve the optimal one. For the foreseeable future this is the only portable option as far as I know.