oneapi-src / Velocity-Bench

Other
44 stars 15 forks source link

[ethminer] Remove unreachable default case in switch statements to prevent potential performance regression. #53

Closed mmoadeli closed 7 months ago

mmoadeli commented 8 months ago

Addressing the issue of Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG could potentially result in performance regression through adding a basic block with an exit to the end of the CFG. An example demonstrating noticeable regression resulting from this change can be found in ethminer, where the code is built using DPC++ for the A100 CUDA backend, using below commands.

CC="clang" CXX="clang++ -fsycl -fsycl-unnamed-lambda" cmake .. -DUSE_SM=80 -DUSE_NVIDIA_BACKEND=ON -DETHASHSYCL=ON -G Ninja
ninja

The performance regression in the sample comes from a switch statement with an unreachable default which is lowered to exit and moved to the end of the CFG.

To address this issue, a pull request was submitted to upstream-llvm. However, it has not been accepted after multiple variations due to concerns about potential side-effects. One suggestion was to modify the source code, which the current pull request offers.

Running ethminer built using DPC++ for the CUDA backend with the command ethminer -S -M 1 yielded the following measurements before the patch:

Total Execution Time: 61.409 s
Total Number of Hashes: 9316597760
Overall Hash Rate: 151.714 MH/s

With the patch applied, the performance improved as follows. It's worth noting that similar measurements will be observed when using a revision of DPC++ built before introduction of Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG:

Total Execution Time: 61.4018 s
Total Number of Hashes: 9462349824
Overall Hash Rate: 154.145 MH/s

This improvement indicates around 2% increase in overall hash rate, highlighting the significance of the change for the application's performance.

jgtong commented 8 months ago

@mmoadeli , Thank you for this modification. I will test your changes before doing a merge into the Velocity-Bench repo.

jgtong commented 7 months ago

@mmoadeli , Update, I can confirm your modifications indeed gave a performance improvement on NVIDIA's H100 (205MH/s -> 208MH/s)

I will also need to check on other platforms before merging.

jgtong commented 7 months ago

@mmoadeli : I have confirmed that your changes do not impact the performance on other platforms. Thank you very much!

mmoadeli commented 7 months ago

@mmoadeli : I have confirmed that your changes do not impact the performance on other platforms. Thank you very much!

Thank you @jgtong