hdevalence / ed25519consensus

Go Ed25519 suitable for use in consensus-critical contexts.
BSD 3-Clause "New" or "Revised" License
50 stars 11 forks source link

Add Support for Preallocating Entry Slice in Batch Verifier #16

Closed patrick-ogrady closed 8 months ago

patrick-ogrady commented 10 months ago

Before:

goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
BenchmarkCreateBatch/1-12    5439469           214.8 ns/op        96 B/op          2 allocs/op
BenchmarkCreateBatch/8-12     704911          1714 ns/op        1104 B/op         12 allocs/op
BenchmarkCreateBatch/64-12     90453         13074 ns/op        9680 B/op         71 allocs/op
BenchmarkCreateBatch/1024-12                5697        204843 ns/op      155089 B/op       1035 allocs/op
BenchmarkCreateBatch/4096-12                1401        859972 ns/op      892369 B/op       4111 allocs/op
BenchmarkCreateBatch/16384-12                316       3727873 ns/op     5094872 B/op      16405 allocs/op
PASS
ok      github.com/hdevalence/ed25519consensus  10.901s

After (~71% reduction in bytes allocated, ~13% reduction in runtime on batch of 16384):

goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
BenchmarkCreatePreallocatedBatch/1-12    5431094           213.9 ns/op        96 B/op          2 allocs/op
BenchmarkCreatePreallocatedBatch/8-12     762730          1585 ns/op         704 B/op          9 allocs/op
BenchmarkCreatePreallocatedBatch/64-12     95875         12557 ns/op        6144 B/op         65 allocs/op
BenchmarkCreatePreallocatedBatch/1024-12                5967        199347 ns/op       90112 B/op       1025 allocs/op
BenchmarkCreatePreallocatedBatch/4096-12                1501        805202 ns/op      360449 B/op       4097 allocs/op
BenchmarkCreatePreallocatedBatch/16384-12                374       3237073 ns/op     1441795 B/op      16385 allocs/op
PASS
ok      github.com/hdevalence/ed25519consensus  10.878s
codecov-commenter commented 10 months ago

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (7232f88) 77.52% compared to head (a53962b) 70.00%.

:exclamation: Current head a53962b differs from pull request most recent head 78b9a57. Consider uploading reports for the commit 78b9a57 to get more accurate results

Files Patch % Lines
batch.go 64.00% 5 Missing and 4 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #16 +/- ## ========================================== - Coverage 77.52% 70.00% -7.53% ========================================== Files 2 2 Lines 89 90 +1 ========================================== - Hits 69 63 -6 - Misses 10 15 +5 - Partials 10 12 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

FiloSottile commented 9 months ago

Thank you for your patience! I pushed a few further changes that cut the allocations down and let the Add inputs avoid escaping to the heap as well. Let me know if they look good and I'll merge and cut a release.

go: go1.21.5
goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/op       │   sec/op     vs base               │
Batch/1-8                         69.62µ ± 1%   70.02µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         243.3µ ± 1%   243.9µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        1.579m ± 1%   1.578m ± 0%       ~ (p=0.739 n=10)
Batch/1024-8                      24.60m ± 4%   24.59m ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      101.1m ± 0%   101.2m ± 2%       ~ (p=0.436 n=10)
Batch/16384-8                     480.2m ± 2%   479.3m ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.239 n=10)
PreallocatedBatch/8-8             243.0µ ± 1%   242.4µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            1.607m ± 6%   1.577m ± 3%  -1.81% (p=0.015 n=10)
PreallocatedBatch/1024-8          24.61m ± 2%   24.56m ± 0%  -0.23% (p=0.002 n=10)
PreallocatedBatch/4096-8          101.1m ± 0%   101.2m ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         479.8m ± 6%   479.3m ± 2%       ~ (p=0.436 n=10)
Verification-8                    48.40µ ± 2%   48.26µ ± 1%       ~ (p=0.436 n=10)
geomean                           3.913m        3.907m       -0.16%

                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/sig      │   sec/sig    vs base               │
Batch/1-8                         69.62µ ± 1%   70.01µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         30.42µ ± 1%   30.48µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        24.66µ ± 1%   24.66µ ± 0%       ~ (p=0.697 n=10)
Batch/1024-8                      24.02µ ± 4%   24.01µ ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      24.69µ ± 0%   24.70µ ± 2%       ~ (p=0.448 n=10)
Batch/16384-8                     29.31µ ± 2%   29.25µ ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.247 n=10)
PreallocatedBatch/8-8             30.37µ ± 1%   30.30µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            25.10µ ± 6%   24.65µ ± 3%  -1.81% (p=0.014 n=10)
PreallocatedBatch/1024-8          24.03µ ± 2%   23.98µ ± 0%  -0.22% (p=0.002 n=10)
PreallocatedBatch/4096-8          24.68µ ± 0%   24.72µ ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         29.29µ ± 6%   29.25µ ± 2%       ~ (p=0.424 n=10)
geomean                           31.17µ        31.12µ       -0.15%

                          │ v0.1.0-4-g31ef22e │        v0.1.0-4-g31ef22e-dirty         │
                          │       B/op        │     B/op      vs base                  │
Batch/1-8                      5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
Batch/8-8                      31.17Ki ± 0%     32.45Ki ± 0%   +4.11% (p=0.000 n=10)
Batch/64-8                     247.7Ki ± 0%     257.4Ki ± 0%   +3.90% (p=0.000 n=10)
Batch/1024-8                   3.636Mi ± 0%     4.006Mi ± 0%  +10.18% (p=0.000 n=10)
Batch/4096-8                   14.71Mi ± 0%     16.28Mi ± 0%  +10.69% (p=0.000 n=10)
Batch/16384-8                  60.16Mi ± 0%     68.72Mi ± 0%  +14.24% (p=0.000 n=10)
PreallocatedBatch/1-8          5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
PreallocatedBatch/8-8          30.78Ki ± 0%     31.25Ki ± 0%   +1.52% (p=0.000 n=10)
PreallocatedBatch/64-8         244.3Ki ± 0%     246.9Ki ± 0%   +1.09% (p=0.000 n=10)
PreallocatedBatch/1024-8       3.574Mi ± 0%     3.621Mi ± 0%   +1.31% (p=0.000 n=10)
PreallocatedBatch/4096-8       14.20Mi ± 0%     14.37Mi ± 0%   +1.16% (p=0.000 n=10)
PreallocatedBatch/16384-8      56.67Mi ± 0%     57.31Mi ± 0%   +1.13% (p=0.000 n=10)
Verification-8                   0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²                  +3.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                          │ v0.1.0-4-g31ef22e │       v0.1.0-4-g31ef22e-dirty        │
                          │     allocs/op     │ allocs/op   vs base                  │
Batch/1-8                        9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
Batch/8-8                        26.00 ± 0%     11.00 ± 0%  -57.69% (p=0.000 n=10)
Batch/64-8                      141.00 ± 0%     14.00 ± 0%  -90.07% (p=0.000 n=10)
Batch/1024-8                   2065.00 ± 0%     19.00 ± 0%  -99.08% (p=0.000 n=10)
Batch/4096-8                   8213.00 ± 0%     23.00 ± 0%  -99.72% (p=0.000 n=10)
Batch/16384-8                 32795.00 ± 0%     29.00 ± 0%  -99.91% (p=0.000 n=10)
PreallocatedBatch/1-8            9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
PreallocatedBatch/8-8           23.000 ± 0%     8.000 ± 0%  -65.22% (p=0.000 n=10)
PreallocatedBatch/64-8         135.000 ± 0%     8.000 ± 0%  -94.07% (p=0.000 n=10)
PreallocatedBatch/1024-8      2055.000 ± 0%     8.000 ± 0%  -99.61% (p=0.000 n=10)
PreallocatedBatch/4096-8      8199.000 ± 0%     8.000 ± 0%  -99.90% (p=0.000 n=10)
PreallocatedBatch/16384-8    32775.000 ± 0%     8.000 ± 0%  -99.98% (p=0.000 n=10)
Verification-8                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²               -97.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
patrick-ogrady commented 9 months ago

Thank you for your patience! I pushed a few further changes that cut the allocations down and let the Add inputs avoid escaping to the heap as well. Let me know if they look good and I'll merge and cut a release.

go: go1.21.5
goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/op       │   sec/op     vs base               │
Batch/1-8                         69.62µ ± 1%   70.02µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         243.3µ ± 1%   243.9µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        1.579m ± 1%   1.578m ± 0%       ~ (p=0.739 n=10)
Batch/1024-8                      24.60m ± 4%   24.59m ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      101.1m ± 0%   101.2m ± 2%       ~ (p=0.436 n=10)
Batch/16384-8                     480.2m ± 2%   479.3m ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.239 n=10)
PreallocatedBatch/8-8             243.0µ ± 1%   242.4µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            1.607m ± 6%   1.577m ± 3%  -1.81% (p=0.015 n=10)
PreallocatedBatch/1024-8          24.61m ± 2%   24.56m ± 0%  -0.23% (p=0.002 n=10)
PreallocatedBatch/4096-8          101.1m ± 0%   101.2m ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         479.8m ± 6%   479.3m ± 2%       ~ (p=0.436 n=10)
Verification-8                    48.40µ ± 2%   48.26µ ± 1%       ~ (p=0.436 n=10)
geomean                           3.913m        3.907m       -0.16%

                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/sig      │   sec/sig    vs base               │
Batch/1-8                         69.62µ ± 1%   70.01µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         30.42µ ± 1%   30.48µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        24.66µ ± 1%   24.66µ ± 0%       ~ (p=0.697 n=10)
Batch/1024-8                      24.02µ ± 4%   24.01µ ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      24.69µ ± 0%   24.70µ ± 2%       ~ (p=0.448 n=10)
Batch/16384-8                     29.31µ ± 2%   29.25µ ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.247 n=10)
PreallocatedBatch/8-8             30.37µ ± 1%   30.30µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            25.10µ ± 6%   24.65µ ± 3%  -1.81% (p=0.014 n=10)
PreallocatedBatch/1024-8          24.03µ ± 2%   23.98µ ± 0%  -0.22% (p=0.002 n=10)
PreallocatedBatch/4096-8          24.68µ ± 0%   24.72µ ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         29.29µ ± 6%   29.25µ ± 2%       ~ (p=0.424 n=10)
geomean                           31.17µ        31.12µ       -0.15%

                          │ v0.1.0-4-g31ef22e │        v0.1.0-4-g31ef22e-dirty         │
                          │       B/op        │     B/op      vs base                  │
Batch/1-8                      5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
Batch/8-8                      31.17Ki ± 0%     32.45Ki ± 0%   +4.11% (p=0.000 n=10)
Batch/64-8                     247.7Ki ± 0%     257.4Ki ± 0%   +3.90% (p=0.000 n=10)
Batch/1024-8                   3.636Mi ± 0%     4.006Mi ± 0%  +10.18% (p=0.000 n=10)
Batch/4096-8                   14.71Mi ± 0%     16.28Mi ± 0%  +10.69% (p=0.000 n=10)
Batch/16384-8                  60.16Mi ± 0%     68.72Mi ± 0%  +14.24% (p=0.000 n=10)
PreallocatedBatch/1-8          5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
PreallocatedBatch/8-8          30.78Ki ± 0%     31.25Ki ± 0%   +1.52% (p=0.000 n=10)
PreallocatedBatch/64-8         244.3Ki ± 0%     246.9Ki ± 0%   +1.09% (p=0.000 n=10)
PreallocatedBatch/1024-8       3.574Mi ± 0%     3.621Mi ± 0%   +1.31% (p=0.000 n=10)
PreallocatedBatch/4096-8       14.20Mi ± 0%     14.37Mi ± 0%   +1.16% (p=0.000 n=10)
PreallocatedBatch/16384-8      56.67Mi ± 0%     57.31Mi ± 0%   +1.13% (p=0.000 n=10)
Verification-8                   0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²                  +3.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                          │ v0.1.0-4-g31ef22e │       v0.1.0-4-g31ef22e-dirty        │
                          │     allocs/op     │ allocs/op   vs base                  │
Batch/1-8                        9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
Batch/8-8                        26.00 ± 0%     11.00 ± 0%  -57.69% (p=0.000 n=10)
Batch/64-8                      141.00 ± 0%     14.00 ± 0%  -90.07% (p=0.000 n=10)
Batch/1024-8                   2065.00 ± 0%     19.00 ± 0%  -99.08% (p=0.000 n=10)
Batch/4096-8                   8213.00 ± 0%     23.00 ± 0%  -99.72% (p=0.000 n=10)
Batch/16384-8                 32795.00 ± 0%     29.00 ± 0%  -99.91% (p=0.000 n=10)
PreallocatedBatch/1-8            9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
PreallocatedBatch/8-8           23.000 ± 0%     8.000 ± 0%  -65.22% (p=0.000 n=10)
PreallocatedBatch/64-8         135.000 ± 0%     8.000 ± 0%  -94.07% (p=0.000 n=10)
PreallocatedBatch/1024-8      2055.000 ± 0%     8.000 ± 0%  -99.61% (p=0.000 n=10)
PreallocatedBatch/4096-8      8199.000 ± 0%     8.000 ± 0%  -99.90% (p=0.000 n=10)
PreallocatedBatch/16384-8    32775.000 ± 0%     8.000 ± 0%  -99.98% (p=0.000 n=10)
Verification-8                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²               -97.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Awesome work on the further optimization (especially avoiding the heap on Add) ❤️ . I tried to tread as lightly as I could on my pass but maybe I should've been more ambitious lol...thanks for taking the time!

Only nit, I preferred to see the batch creation time separate from the batch verification time (just to get a better sense of where time was spent but eh).

FiloSottile commented 8 months ago

Only nit, I preferred to see the batch creation time separate from the batch verification time (just to get a better sense of where time was spent but eh).

I generally prefer to make benchmarks for whole operations that consumers are likely to perform discretely. That avoids over-indexing on a sub-benchmark once it's fast enough and it doesn't move the overall needle much. Different parts that contribute to the running time can be visualized with flame graphs. In this case, I don't think any user will do batch creation without batch verification, or reuse the batch for multiple verifications, so I think an overall benchmark is the right metric.