gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Other
1.07k stars 500 forks source link

Independent Warp Scheduling in Volta+'s SIMT model #292

Open quadpixels opened 6 months ago

quadpixels commented 6 months ago

Volta introduced Warp Scheduling, as described by the Volta Whitepaper:

Volta maintains per-thread scheduling resources such as program counter (PC) and call stack (S), while earlier architectures maintained these resources per warp.

In my experiments with a few highly divergent workloads, if I run only 1 thread, gpgpu-sim's results are close to the result on real HW. Hoever when multiple threads are run, a huge difference appears. The more divergent the workload, the bigger the difference.

Reducing the warp size mitigates the difference but does not completely eliminate it.

Independent Warp Scheduling doesn't seem to be the same thing as the sub-core model introduced in gpgpu-sim 4.x.

Am I missing something or is Independent Thread Scheduling not supported? Or do you have the same observation?

Thanks!

quadpixels commented 6 months ago

vulkan-sim has ITS