ulysseB / telamon

A framework to find good combinations of optimizations for computational kernels on GPUs.
https://ulysseb.github.io/telamon/telamon
Apache License 2.0
23 stars 6 forks source link

[cuda] Correctly handle memory replays for Maxwell and later #284

Closed Elarnon closed 4 years ago

Elarnon commented 5 years ago

The instruction replay behavior has changed in Maxwell compared to earlier designs, and is now handled by the individual units, not the scheduler 1. As such, for compute capabilities 5 and later, we need to update the memory model to handle this properly; otherwise, we end up with a pressure on issue that is way too high compared to the reality.

Elarnon commented 4 years ago

Superseeded by #307