google / uVkCompute

A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Apache License 2.0
224 stars 38 forks source link

[matmul] Tweak innerproduct i8->i32 implementation #31

Closed kuhar closed 1 year ago

kuhar commented 1 year ago

Perform intermediate computation (multiplication) on i16 operands to allow to improve register usage.

This improves performance from ~190 to ~245 GFLOps on Pixel 6.