Independent Warp Scheduling in Volta+'s SIMT model

Volta introduced Warp Scheduling, as described by the Volta Whitepaper:

Volta maintains per-thread scheduling resources such as program counter (PC) and call stack (S), while earlier architectures maintained these resources per warp.

In my experiments with a few highly divergent workloads, if I run only 1 thread, gpgpu-sim's results are close to the result on real HW. Hoever when multiple threads are run, a huge difference appears. The more divergent the workload, the bigger the difference.

Reducing the warp size mitigates the difference but does not completely eliminate it.

Independent Warp Scheduling doesn't seem to be the same thing as the sub-core model introduced in gpgpu-sim 4.x.

Am I missing something or is Independent Thread Scheduling not supported? Or do you have the same observation?

Thanks!

gpgpu-sim / gpgpu-sim_distribution

Independent Warp Scheduling in Volta+'s SIMT model #292