te42kyfo / gpu-benches

collection of benchmarks to measure basic GPU capabilities
GNU General Public License v3.0
265 stars 41 forks source link

Why the block size in gpu-latency is 64 #10

Open yjl0101 opened 2 months ago

yjl0101 commented 2 months ago

Hi, I notice that the block size in gpu-latency is 64 rather than 32 (warp size) or a single thread. Is there any consideration to set as 64? Looking forward to your reply:)

te42kyfo commented 2 months ago

A single thread would work just the same. Using a full warp just feels better, and a full warp is 64 threads on CDNA hardware. On NVIDIA, this actually runs two warps instead of just one, but from what I remember, the interference is low.

So no particularly strong reasoning.

yjl0101 commented 2 months ago

@te42kyfo aha, 64 for AMD gpus, got it and thanks a lot