Open yjl0101 opened 2 months ago
A single thread would work just the same. Using a full warp just feels better, and a full warp is 64 threads on CDNA hardware. On NVIDIA, this actually runs two warps instead of just one, but from what I remember, the interference is low.
So no particularly strong reasoning.
@te42kyfo aha, 64 for AMD gpus, got it and thanks a lot
Hi, I notice that the block size in gpu-latency is 64 rather than 32 (warp size) or a single thread. Is there any consideration to set as 64? Looking forward to your reply:)