ulysseB / telamon

A framework to find good combinations of optimizations for computational kernels on GPUs.
https://ulysseb.github.io/telamon/telamon
Apache License 2.0
23 stars 6 forks source link

[cuda] Assume Maxwell and Pascal are dual issue #283

Closed Elarnon closed 5 years ago

Elarnon commented 5 years ago

The C Programming Guide says that recent (5.x and later) architectures can issue a single instruction per cycle, but other nvidia documentation[1] says that dual issue is still possible. This is coherent with generated SASS assembly, as well as practical measurements: the bounds from the performance model are way too high for some kernels, and changing this setting drops them to reasonable levels.

As noted in the nvidia blog[1], there are apparently more limitations on those architectures however; such as only being able to issue one load and one arithmetic operation at the same time (and SASS examination shows that there might be additional restrictions, e.g. it looks like in some cases, 128-bit loads can't be dual-issued). This is not an issue, since the performance model is optimistic anyways.

Hence, this patch changes the performance model to ignore the Programming Guide and assume dual-issue for those architectures.

1: https://devblogs.nvidia.com/5-things-you-should-know-about-new-maxwell-gpu-architecture/