kriskwiatkowski / nobs

Implementation of cryptographic primitives in Go
Other
12 stars 2 forks source link

PERF: sidh-p503: Split sub and add into 2 uops instead of 3 #8

Closed kriskwiatkowski closed 5 years ago

kriskwiatkowski commented 5 years ago

The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible)

For details, see: https://www.agner.org/optimize/instruion_tables.pdf

New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op

This just improves one function, but more functions can be improved

codecov-io commented 5 years ago

Codecov Report

Merging #8 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master       #8   +/-   ##
=======================================
  Coverage   94.44%   94.44%           
=======================================
  Files          19       19           
  Lines        1800     1800           
=======================================
  Hits         1700     1700           
  Misses         67       67           
  Partials       33       33

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e9ddb6f...272e3d8. Read the comment docs.