Closed MaheshRavishankar closed 2 days ago
Oops. Sorry push the branch to upstream repo instead of my fork and broke the naming convention. Will delete branch after it lands.
@ commit b8a2701f5d91366e7318dcfdb76cdba464bab8d3 (vs. base 4294a5b0ebaec6dcca483bf16f5918108b09ea0a)
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] | 108.340 (vs. 96.330, 12.47%↑) | 108.971 | 1.817 |
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] | 104.999 (vs. 112.742, 6.87%↓) | 106.298 | 5.082 |
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 28.537 (vs. 30.566, 6.64%↓) | 28.628 | 0.687 |
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 129.569 (vs. 138.647, 6.55%↓) | 129.426 | 0.447 |
[Top 3 out of 7 results showed]
No improved or regressed compilation metrics 🏖️
For more information:
oh, my bad.. I did not notice that there is auto-merge..
Under aggressive fusion, drop the restriction of consumer iteration space being same dimensionality as the producer iteration space. Typically this can lead to large vectors if not handled properly. So this is guarded under
--iree-flow-enable-aggressive-fusion
flag.Fixes https://github.com/nod-ai/SHARK-Turbine/issues/749