Today, variables (usually arrays) placed into groupshared memory are simply marked with 'groupshared' and the compiler sees fit where to place those variables in the available groupshared memory.
However, it's often the case that the lifetime of two different variables/arrays are non-overlapping which is not detected by the compiler (in my experience). The total amount of groupshared memory used by a shader is simply the sum of all referenced variables in groupshared memory.
This lack of aliasing leads to two distinct problems with undesirable workarounds:
More groupshared memory is used than is strictly necessary.
It's very difficult to access the same region of groupshared memory through two different types. For example, you might want to store an array of uint4s, but then index into that array as if they were float16_t2.
I can see two potential solutions, with no strong feelings yet on which one I prefer.
Allow a constant-buffer style "offset(X)" semantic on "groupshared" variables to explicitly set their relative offset in the groupshared allocation, thereby allowing aliasing of said variables.
Expose groupshared memory like a RWByteAddressBuffer that just happens to be backed by groupshared memory. The contents start off undefined and a shader can Load and Store to it for the lifetime of the shader at whatever byte address they see fit. This takes care of aliasing by allowing a shader author to "Store" while also "Load" from the same byte address. This paradigm already exists for UAVs, so it would be fairly natural to think of a RWByteAddressBuffer spanning a user-declared amount of groupshared memory.
Both approaches would probably need to be paired with a new entry point attribute akin to
[groupsharedmemoryrequired(4096)] // 4KB of groupshared memory required
Since the compiler is no longer summing the total of all groupshared variables to determine how much groupshared memory a shader needs.
Today, variables (usually arrays) placed into groupshared memory are simply marked with 'groupshared' and the compiler sees fit where to place those variables in the available groupshared memory.
However, it's often the case that the lifetime of two different variables/arrays are non-overlapping which is not detected by the compiler (in my experience). The total amount of groupshared memory used by a shader is simply the sum of all referenced variables in groupshared memory.
This lack of aliasing leads to two distinct problems with undesirable workarounds:
I can see two potential solutions, with no strong feelings yet on which one I prefer.
Both approaches would probably need to be paired with a new entry point attribute akin to
[groupsharedmemoryrequired(4096)] // 4KB of groupshared memory required
Since the compiler is no longer summing the total of all groupshared variables to determine how much groupshared memory a shader needs.
Thanks,
Adam