Closed muellch closed 3 years ago
Not performing the common subexpression elimination during the lowering seems to result in very similar performance because the CSE of the PTX compiler compensates.
The open question is which variant produces faster code.
Not performing the common subexpression elimination during the lowering seems to result in very similar performance because the CSE of the PTX compiler compensates.
The open question is which variant produces faster code.