google / uVkCompute

A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Apache License 2.0
224 stars 38 forks source link

Fix memory benchmarks for unexpected gl_SubgroupSize #44

Closed dneto0 closed 11 months ago

dneto0 commented 11 months ago

Some Intel GPUs have flexible subgroup sizes. subgroupSize can be 32 but minSubgroupSize can be smaller. In this case, unless you forcibly control the subgroup size at pipeline creation time, gl_SubgroupSize will report 32 but the actual number of invocations in the subgroup may be 8.

In the memory benchmarks, use a bitcount of the ballot to compute the dynamic (actual) size of the subgroup. The alternative is to use the much more recent (and less portable) subgroup size control extension.

Fixes: #43

antiagainst commented 11 months ago

TIL. Thanks David! I overlooked this tricky part before.

Wow, this is surprising to me since gl_SubgroupSize is not even a constant.

I've also posted some useful resources in https://github.com/google/uVkCompute/issues/43#issuecomment-1826850670.

When I used the subgrup tutorial as the reference https://www.khronos.org/blog/vulkan-subgroup-tutorial I understood that using gl_SubgroupSize was the way to get the actual subgroup size.

That tutuorial is quite early. Apparently there are more developments and new extensions following it to improve things.