Inefficient threadgroup sizes and thread allocations lead to increased overhead and underutilization of Metal’s GPU capabilities, resulting in suboptimal MSM performance.
Details
Fine-tune threadgroup sizes and thread allocations to align with Metal's GPU architecture. Aim to minimize overhead and maximize parallelism by determining optimal threadgroup config for different stages of the MSM process.
We are suffering from the GPU Hang Error on mobile device in current Metal MSM, and we suspect that the reason could be related to the maximum memory allocation of threadgroup. There
Acceptance criteria
Successfully run 2^20 to 2^22 sizes of MSM without encountering the GPU Hang Error on iOS device
Determine optimal workgroup sizes for various stages of the MSM process within Metal.
Implement dynamic thread dispatching strategies to balance the load across Metal’s GPU cores.
Test multiple configurations to identify the most performant setup on target iOS devices.
Achieve measurable performance improvements compared to the baseline thread allocation strategy.
Problem
Inefficient threadgroup sizes and thread allocations lead to increased overhead and underutilization of Metal’s GPU capabilities, resulting in suboptimal MSM performance.
Details
Fine-tune threadgroup sizes and thread allocations to align with Metal's GPU architecture. Aim to minimize overhead and maximize parallelism by determining optimal threadgroup config for different stages of the MSM process.
We are suffering from the GPU Hang Error on mobile device in current Metal MSM, and we suspect that the reason could be related to the maximum memory allocation of threadgroup. There
Acceptance criteria
Reference