ifdefelse / ProgPOW

A Programmatic Proof-of-Work for Ethash. Forked from https://github.com/ethereum-mining/ethminer
GNU General Public License v3.0
259 stars 84 forks source link

Profiling on 1080 Ti #38

Open jeff-ruby opened 5 years ago

jeff-ruby commented 5 years ago

There has a NSIGHT profiling result on the web: https://medium.com/@ifdefelse/understanding-progpow-performance-and-tuning-d72713898db3

I have also tried to do profiling on 1080 Ti with the same codebsae from this github, and have some questions to ask.

The result shows that the ‘Issued Warp Per scheduler’ is only 0.77, which implies that the poor latency hiding, it might be too low compared to 0.94 on 1060, 0.88 on 1070.

Also, the result of ‘Warp State Statistics’ shows that, the bottleneck is ‘Stall Short Scoreboard’ which is related to operations to shared memory.

Below are my shared memory profiling:

Instructions, Requests, %Peak, Bank Conflicts

201326592, 706282140, 76.35, 504955548

Compared with 1060 and 1070, they are the same instructions, but more requests and bank conflicts, I guess it might be the reason of high latency on my experiment.

But, I don’t know why the requests and bank conflicts are about 257274 more than 1060/1070, could anyone help with that?

ifdefelse commented 5 years ago

The number of requests and bank conflicts are data dependent and change for every hash. A random value from a register is used for the load address so how many conflicts there are across a warp is random.

If you run the same block and hash on 1060/1070/1080Ti you should get the same results. Running a different block/hash you're seeing a 0.05% difference, which is negligible and actually less variation than I would expect.