Closed min-lee closed 7 months ago
The logic as written is correct, if conservative. While the number of thread-groups being launched is too many to fit in a single Dispatch
call, we increase the size of the thread group. Otherwise, if it can fit in a single Dispatch
with the base thread group sizes, we just use those thread group sizes. The base sizes could probably be made larger though...
Reopening. This issue doesn't describe the problem well, but there is a problem: for a work group with prime / odd sizes, the local size will be shrunk down to 1 in that dimension. We should either pick a better size that still evenly divides the global size, or else we should pick a reasonable size for performance and kick the rest of the threads to a separate dispatch with a smaller thread group size.
I found that this code in clEnqueueNDRangeKernel() doesn't work. This code is meant to expand size of localsize, but it fails because the condition DispatchDimensions[i] > D3D12_CS_DISPATCH_MAX_THREAD_GROUPS_PER_DIMENSION is almost always false.