Closed kriskwiatkowski closed 5 years ago
Merging #8 into master will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #8 +/- ##
=======================================
Coverage 94.44% 94.44%
=======================================
Files 19 19
Lines 1800 1800
=======================================
Hits 1700 1700
Misses 67 67
Partials 33 33
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update e9ddb6f...272e3d8. Read the comment docs.
The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible)
For details, see: https://www.agner.org/optimize/instruion_tables.pdf
New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op
This just improves one function, but more functions can be improved