ulysseB / telamon

A framework to find good combinations of optimizations for computational kernels on GPUs.
https://ulysseb.github.io/telamon/telamon
Apache License 2.0
23 stars 6 forks source link

Tentatively handle memory replays for Maxwell+ architectures #307

Closed Elarnon closed 4 years ago

Elarnon commented 4 years ago

The instruction replay behavior has changed in Maxwell compared to earlier designs, and is now handled by the individual units, not the scheduler [1]. As such, for compute capabilities 5 and later, we need to update the memory model to handle this properly; otherwise, we end up with a pressure on issue that is way too high compared to the reality.

[1]: https://stackoverflow.com/questions/35566178/how-to-explain-instruction-replay-in-cuda