Closed antiagainst closed 1 week ago
(Replaced by https://github.com/iree-org/iree/pull/17662)
@ commit dddd1e00581dbbd4ae7bf70669c85d9d1ea42d91 (vs. base 2ff4102aba9e878f729840da66a44fe4bd3c8790)
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
matmul\_2562x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] | 1.536 (vs. 1.371, 11.99%↑) | 1.536 | 0.000 |
matmul\_123x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] | 0.223 (vs. 0.201, 11.23%↑) | 0.223 | 0.000 |
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 141.004 (vs. 128.966, 9.33%↑) | 140.918 | 0.492 |
[Top 3 out of 5 results showed]
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
matmul\_3456x1024x2048\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] | 0.131 (vs. 0.165, 20.42%↓) | 0.131 | 0.000 |
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 43.611 (vs. 49.177, 11.32%↓) | 43.622 | 0.456 |
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 178.447 (vs. 194.392, 8.20%↓) | 179.563 | 4.573 |
[Top 3 out of 11 results showed]
Benchmark Name | Total Dispatch Size (bytes) |
---|---|
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] | 11392 (vs. 12864, 11.44%↓) |
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] | 11280 (vs. 12336, 8.56%↓) |
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] | 18224 (vs. 19328, 5.71%↓) |
[Top 3 out of 6 results showed]
Benchmark Name | Stream IR Dispatch Count (# of cmd.dispatch ops) |
---|---|
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] | 330 (vs. 318, 3.77%↑) |
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] | 330 (vs. 318, 3.77%↑) |
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] | 330 (vs. 318, 3.77%↑) |
[Top 3 out of 10 results showed]
Benchmark Name | Stream IR Dispatch Count (# of cmd.dispatch ops) |
---|---|
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] | 355 (vs. 367, 3.27%↓) |
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] | 355 (vs. 367, 3.27%↓) |
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] | 355 (vs. 367, 3.27%↓) |
[Top 3 out of 6 results showed]
For more information:
Updated to llvm/llvm-project@27ac46e6bea2
MathExtras.h
to replace MLIR oneapplySignatureConversion
usageUpdated to openxla/stablehlo@dd48ec5
chlo.minimum_broadcast_shapes
op was removed https://github.com/openxla/stablehlo/pull/2287chlo.dynamic_reshape
op was removed https://github.com/openxla/stablehlo/pull/2286Updated to llvm/torch-mlir@77d7f64