microsoft / OpenCLOn12

The OpenCL-on-D3D12 mapping layer
MIT License
104 stars 13 forks source link

Rework local size computation #56

Closed jenatali closed 7 months ago

jenatali commented 7 months ago

See https://github.com/microsoft/OpenCLOn12/issues/50

The previous algorithm did not gracefully handle odd/prime sizes. It would start with a default size, shrink to 1, and never expand again.

The new algorithm attempts to prime factorize the global size and add that into the local size. This continues until it hits a sweet spot for the device's wave size, >= min and < max (unless max == min). Failing to find a sweet spot, it'll just attempt to use one of the global dimensions as the local dimension.

This ended up being more similar to #51 than I expected when reading that change, though a few key differences: Don't allocate vectors of factors (just use an iterator into a static list), target the wave size instead of the max thread group size, prefer power-of-2 factorization instead of starting at the end of the list of factors.