poanetwork / vdf

An implementation of Verifiable Delay Functions in Rust
Apache License 2.0
174 stars 53 forks source link

Low performance on AMD Ryzen #18

Open rtsisyk opened 5 years ago

rtsisyk commented 5 years ago

Both Pietrzak and Wesolowski functions are about 10-20x times slower on AMD CPUs:

Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz:

time vdf-cli -t pietrzak aa 100 005271e8f9ab2eb8a2906e851dfcb5542e4173f016b85e29d481a108dc82ed3b3f97937b7aa824801138d1771dea8dae2f6397e76a80613afda30f2c30a34b040baaafe76d5707d68689193e5d211833b372a6a4591abb88e2e7f2f5a5ec818b5707b86b8b2c495ca1581c179168509e3593f9a16879620a4dc4e907df452e8dd0ffc4f199825f54ec70472cc061f22eb54c48d6aa5af3ea375a392ac77294e2d955dde1d102ae2ace494293492d31cff21944a8bcb4608993065c9a00292e8d3f4604e7465b4eeefb494f5bea102db343bb61c5a15c7bdf288206885c130fa1f2d86bf5e4634fdc4216bc16ef7dac970b0ee46d69416f9a9acee651d158ac64915b                                                                                                    
Proof is valid                                                                                                                       

real    0m0.944s
user    0m0.944s
sys     0m0.000s

AMD Ryzen 7 2700X Eight-Core Processor:

time vdf-cli -t pietrzak aa 100 005271e8f9ab2eb8a2906e851dfcb5542e4173f016b85e29d481a108dc82ed3b3f97937b7aa824801138d1771dea8dae2f6397e76a80613afda30f2c30a34b040baaafe76d5707d68689193e5d211833b372a6a4591abb88e2e7f2f5a5ec818b5707b86b8b2c495ca1581c179168509e3593f9a16879620a4dc4e907df452e8dd0ffc4f199825f54ec70472cc061f22eb54c48d6aa5af3ea375a392ac77294e2d955dde1d102ae2ace494293492d31cff21944a8bcb4608993065c9a00292e8d3f4604e7465b4eeefb494f5bea102db343bb61c5a15c7bdf288206885c130fa1f2d86bf5e4634fdc4216bc16ef7dac970b0ee46d69416f9a9acee651d158ac64915b                                                                                                    
Proof is valid

real    0m12,991s
user    0m12,988s
sys 0m0,000s

cpuinfo:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 23
model       : 8
model name  : AMD Ryzen 7 2700X Eight-Core Processor
stepping    : 2
microcode   : 0x800820b
cpu MHz     : 1822.593
cache size  : 512 KB
physical id : 0
siblings    : 16
core id     : 0
cpu cores   : 8
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs        : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 7385.43
TLB size    : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
afck commented 5 years ago

Thank you, we weren't aware of this! (At least I wasn't.)

The GMP benchmarks look very different. Were your GMP libraries compiled with similar optimization settings on both systems?