ifdefelse / ProgPOW

A Programmatic Proof-of-Work for Ethash. Forked from https://github.com/ethereum-mining/ethminer
GNU General Public License v3.0
257 stars 84 forks source link

Performance analysis of DATASET_PARENTS​=512 #50

Open chfast opened 4 years ago

chfast commented 4 years ago

The ProgPoW software audit recommend to increase the DATASET_PARENTS​ Ethash cache parameter from 256 to 512. This has direct impact on verification performance as the time for single verification doubles (while ProgPoW verification slowdown is only 30-50% over Ethash).

The DATASET_PARENTS​ increase makes the verification "even more" memory hard and lowers the instruction per cycle ratio to 1 (the max being 4).

ProgPoW verification, DATASET_PARENTS = 256, epoch 0:

cset shield -- perf stat -B -e cache-references,cache-misses,cycles,instructions test/ethash-bench --benchmark_filter=progpow_hash/0
cset: **> 1 tasks are not movable, impossible to move
cset: --> last message, executed args into cpuset "/user", new pid is: 10825
2019-09-10 14:19:50
Running test/ethash-bench
Run on (8 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
progpow_hash/0       1960 us       1960 us        347

 Performance counter stats for 'test/ethash-bench --benchmark_filter=progpow_hash/0':

        65 642 783      cache-references                                            
        39 184 374      cache-misses              #   59,693 % of all cache refs    
     5 636 657 996      cycles                                                      
     7 104 679 821      instructions              #    1,26  insn per cycle         

       1,314309256 seconds time elapsed

       1,296116000 seconds user
       0,000000000 seconds sys

ProgPoW verification, DATASET_PARENTS = 512, epoch 0:

cset shield -- perf stat -B -e cache-references,cache-misses,cycles,instructions test/ethash-bench --benchmark_filter=progpow_hash/0
cset: **> 1 tasks are not movable, impossible to move
cset: --> last message, executed args into cpuset "/user", new pid is: 10697
2019-09-10 14:19:26
Running test/ethash-bench
Run on (8 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
progpow_hash/0       3695 us       3694 us        195

 Performance counter stats for 'test/ethash-bench --benchmark_filter=progpow_hash/0':

        87 073 601      cache-references                                            
        48 426 695      cache-misses              #   55,616 % of all cache refs    
     6 589 826 522      cycles                                                      
     6 898 095 482      instructions              #    1,05  insn per cycle         

       1,534862112 seconds time elapsed

       1,512262000 seconds user
       0,004011000 seconds sys
solardiz commented 4 years ago

How about increasing the size of the DAG cache instead, above Ethereum's current curve, at the time of the switch to ProgPoW? Sizes of a few hundred MB should be acceptable for light verification now, and wouldn't result in significantly slower verification (right?)

chfast commented 4 years ago

Sizes of a few hundred MB should be acceptable for light verification now, and wouldn't result in significantly slower verification (right?)

That does not seem to be the case. From my observations, the verification strictly depends on the Cache access time and depends on L3 cache size in CPU. The more memory of the Cache will not fit into L3 cache the slower it will be.

solardiz commented 4 years ago

I used the word "significantly" specifically to account for the potential slight slowdown from the lower L3 cache hit rate. The DAG cache is already in excess of typical CPUs' L3 cache sizes (although those are increasing as well). In my experience (not with Ethash/ProgPoW, though), while L3 cache is a lot faster than RAM in synthetic benchmarks designed to fit in the cache, it provides little speedup for non-trivial algorithms - e.g., for yescrypt on a typical server platform there's little reduction in bandwidth when going from 16 MiB to higher sizes (even when I tweak it to reduce the amount of computation so that it could potentially use more bandwidth with the lower sizes). I've even seen cases where L3 cache hurt performance, compared to reading non-cached data from RAM, when the data happened to be cached in a CPU in a different socket.