Closed dneto0 closed 11 months ago
We can either use subgroup size control to force the pipeline to use a particular subgroup size; that will make gl_SubgroupSize have the expected value.
Alternately, we can compute the stride by doing a bitount on a ballot. (Doing inclusive max is not baseline functionality.)
Thanks David for pointing out the issue! I overlooked this tricky part..
Some useful references:
To answer @kuhar's question about a definitive reference, the best info is the SubgroupSize reference in the Vulkan spec. Unfortunately that's not very easy to read.
VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT
then gl_SubgroupSize is the "real" size of the subgroup. But it doesn't have to match the subgroupSize physical device property. Instead it's bounded between minSubgroupSize and maxSubgroupSize from the VkPhysicalDeviceSubgroupSizeControlProperties or VkPhysicalDeviceVulkan13Properties. (There's more: you can control the subgroup size at pipeline creation time.....)
This Intel device reports:
See discussion at https://gitlab.freedesktop.org/mesa/mesa/-/blob/698344b93c49a9f3a257a0ef4546edf5cd3a9130/src/intel/compiler/brw_compiler.h#L159
But the shader copy_storage_buffer_scalar.glsl uses gl_SubgroupSize to stride across the data. It has value 32. But when the actual subgroup size is 8, that means we only write 1/4 of the data, and the test fails its own validation.