zkmopro / gpu-acceleration

7 stars 1 forks source link

Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

Open moven0831 opened 1 week ago

moven0831 commented 1 week ago

Problem

Inefficient threadgroup sizes and thread allocations lead to increased overhead and underutilization of Metal’s GPU capabilities, resulting in suboptimal MSM performance.

Details

Fine-tune threadgroup sizes and thread allocations to align with Metal's GPU architecture. Aim to minimize overhead and maximize parallelism by determining optimal threadgroup config for different stages of the MSM process.

We are suffering from the GPU Hang Error on mobile device in current Metal MSM, and we suspect that the reason could be related to the maximum memory allocation of threadgroup. There

Acceptance criteria

Reference