google / uVkCompute

A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Apache License 2.0
224 stars 38 forks source link

Inconsistent gl_SubgroupSize across different GPUs and Vulkan versions/extensions #43

Closed dneto0 closed 11 months ago

dneto0 commented 11 months ago

This Intel device reports:

See discussion at https://gitlab.freedesktop.org/mesa/mesa/-/blob/698344b93c49a9f3a257a0ef4546edf5cd3a9130/src/intel/compiler/brw_compiler.h#L159

But the shader copy_storage_buffer_scalar.glsl uses gl_SubgroupSize to stride across the data. It has value 32. But when the actual subgroup size is 8, that means we only write 1/4 of the data, and the test fails its own validation.

dneto0 commented 11 months ago

We can either use subgroup size control to force the pipeline to use a particular subgroup size; that will make gl_SubgroupSize have the expected value.

Alternately, we can compute the stride by doing a bitount on a ballot. (Doing inclusive max is not baseline functionality.)

antiagainst commented 11 months ago

Thanks David for pointing out the issue! I overlooked this tricky part..

Some useful references:

dneto0 commented 11 months ago

To answer @kuhar's question about a definitive reference, the best info is the SubgroupSize reference in the Vulkan spec. Unfortunately that's not very easy to read.

  1. In the beginning there was Vulkan 1.1, SPIRV-1.3. That added the subgroupSize in VkPhysicalDeviceSubgroupProperties. Things were simple: a GPU had a single subgroup size and gl_SubgroupSize gave it to you in the shader.
  2. Then Intel introduced GPUs that could choose a subgroup size of 8, 16, or 32, which is more flexible than what Vulkan 1.1 anticipated. So VK_EXT_subgroup_size_control was created, and then incorporated into Vulkan 1.3. This is when gl_SubgroupSize gets the unexpected behaviour. The unexpected behaviour is now deprecated. In SPIR-V 1.6 and later, or if you specify VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT then gl_SubgroupSize is the "real" size of the subgroup. But it doesn't have to match the subgroupSize physical device property. Instead it's bounded between minSubgroupSize and maxSubgroupSize from the VkPhysicalDeviceSubgroupSizeControlProperties or VkPhysicalDeviceVulkan13Properties. (There's more: you can control the subgroup size at pipeline creation time.....)