Closed dneto0 closed 11 months ago
TIL. Thanks David! I overlooked this tricky part before.
Wow, this is surprising to me since
gl_SubgroupSize
is not even a constant.
I've also posted some useful resources in https://github.com/google/uVkCompute/issues/43#issuecomment-1826850670.
When I used the subgrup tutorial as the reference https://www.khronos.org/blog/vulkan-subgroup-tutorial I understood that using
gl_SubgroupSize
was the way to get the actual subgroup size.
That tutuorial is quite early. Apparently there are more developments and new extensions following it to improve things.
Some Intel GPUs have flexible subgroup sizes. subgroupSize can be 32 but minSubgroupSize can be smaller. In this case, unless you forcibly control the subgroup size at pipeline creation time, gl_SubgroupSize will report 32 but the actual number of invocations in the subgroup may be 8.
In the memory benchmarks, use a bitcount of the ballot to compute the dynamic (actual) size of the subgroup. The alternative is to use the much more recent (and less portable) subgroup size control extension.
Fixes: #43