Open mkannwischer opened 2 days ago
I've performed my own benchmarks on c7i using
./scripts/tests bench -c PMU --cflags="-mavx2 -mbmi2 -mpopcnt -maes -flto"
Here are my measurements for keypair cycles of mlkem-512 with the C backend
[32085, 32225, 32194, 32499, 32580, 32543, 32535, 32576, 32715, 32581]
.
max is 32715 min is 32085
That's within 2% of each other which looks okay. But in #424, we saw differences of 10% and more frequently.
I've tried perf (replaced PERF_COUNT_HW_CPU_CYCLES
with PERF_COUNT_HW_REF_CPU_CYCLES
):
./scripts/tests bench -c PERF --cflags="-mavx2 -mbmi2 -mpopcnt -maes -flto" -r
[32273, 32087, 32370, 32130, 32079, 32328, 32480, 32093, 32408, 32482]
max is 32482 min is 32079
So that seems to be a viable alternative (slighly less variance here, but that's likely luck).
One theory is that the cycle increase if the hypervisor is interrupting the VM messing with the rdtsc
benchmarks. This could be mitigated by using perf with PERF_COUNT_HW_REF_CPU_CYCLES
.
But as I do not see the variance right now, this is hard to test for me now.
I've run more experiments in CI:
I'm comparing the performance we are seeing on #424 (with PMU) vs. #430 (PERF w/ PERF_COUNT_HW_REF_CPU_CYCLES) on c7i and restarted the benchmarking the CI multiple times:
Here are my measurements for keypair cycles of mlkem-512 with the C backend
PMU: [34765,34953,36790,34479,34632]
PERF: [34542, 32884, 35137,34554, 34647]
Seems equally bad :( and different :( x86 makes me sad.
The runs are here: PMU: https://github.com/pq-code-package/mlkem-native/actions/runs/11909823923/job/33237075854?pr=424 PERF: https://github.com/pq-code-package/mlkem-native/actions/runs/11924957674/job/33236975067?pr=430
I've performed some more benchmarking on different instance types. I launched 5 c7i.large
instances and 5 c7i.metal-24xl
instances.
Here are the results with gcc 13.2.0
The command I am running is ./scripts/tests bench -c PMU --cflags="-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto"
TL;DR: the benchmarks are stable on c7i.metal-24xl, so the variance indeed seems to be due to virtualization. Unfortunately, this is bad news for us as the metal boxes are too pricey to run CI on it.
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 29180
encaps cycles = 35883
decaps cycles = 46855
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 28062 28568 28778 28900 29042 29180 29341 29474 29633 29851 31793
encaps percentiles: 34523 35155 35432 35605 35749 35883 36010 36164 36310 36561 38702
decaps percentiles: 45009 45990 46241 46441 46683 46855 47022 47152 47335 47614 49568
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 50437
encaps cycles = 59270
decaps cycles = 75476
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 48572 49540 49860 50056 50258 50437 50636 50812 51066 51628 53697
encaps percentiles: 57658 58319 58622 58910 59110 59270 59432 59599 59883 60303 61784
decaps percentiles: 73540 74444 74794 75052 75246 75476 75675 75871 76101 76638 78485
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 74709
encaps cycles = 87368
decaps cycles = 110252
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 72708 73585 74012 74282 74549 74709 74953 75194 75543 76667 79245
encaps percentiles: 85314 86227 86683 86924 87174 87368 87603 87885 88272 89040 91139
decaps percentiles: 107911 108946 109472 109773 109995 110252 110559 110796 111234 111993 114092
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 10842
encaps cycles = 15347
decaps cycles = 20985
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 10390 10625 10676 10724 10780 10842 10909 10995 11143 11549 12589
encaps percentiles: 14843 15139 15203 15255 15296 15347 15405 15493 15564 15661 16355
decaps percentiles: 19803 20421 20593 20723 20853 20985 21110 21255 21473 22049 22833
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 18850
encaps cycles = 21061
decaps cycles = 28488
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 18010 18454 18613 18702 18782 18850 18933 19033 19186 19467 20221
encaps percentiles: 20006 20615 20822 20946 21003 21061 21139 21229 21313 21437 22128
decaps percentiles: 27454 27997 28193 28344 28434 28488 28584 28693 28778 28955 29679
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 25269
encaps cycles = 29265
decaps cycles = 39824
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 24343 24807 24980 25104 25199 25269 25392 25481 25601 25823 26809
encaps percentiles: 28304 28720 28912 29050 29188 29265 29351 29458 29639 30002 30809
decaps percentiles: 38493 39118 39337 39530 39692 39824 39978 40102 40255 40538 41623
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 30133
encaps cycles = 37254
decaps cycles = 48576
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 29601 29792 29913 29996 30050 30133 30219 30357 30487 30798 32956
encaps percentiles: 36581 36864 36985 37075 37165 37254 37350 37432 37540 37725 39978
decaps percentiles: 47884 48158 48289 48374 48470 48576 48666 48801 48921 49109 51028
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 51937
encaps cycles = 61048
decaps cycles = 77828
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 50970 51317 51536 51664 51800 51937 52043 52182 52387 52843 55128
encaps percentiles: 60324 60609 60742 60850 60952 61048 61131 61233 61383 61756 63631
decaps percentiles: 76944 77330 77486 77602 77715 77828 77960 78057 78217 78579 80506
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 76954
encaps cycles = 89878
decaps cycles = 113724
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 76163 76454 76604 76737 76852 76954 77085 77216 77484 79416 82060
encaps percentiles: 88991 89347 89536 89629 89766 89878 89989 90168 90375 92217 95286
decaps percentiles: 112601 113102 113282 113449 113603 113724 113903 114047 114265 115853 119377
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 11067
encaps cycles = 15730
decaps cycles = 23104
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 10937 10983 11008 11027 11046 11067 11090 11112 11158 11824 12871
encaps percentiles: 15598 15638 15664 15690 15705 15730 15760 15814 15910 16391 17358
decaps percentiles: 21033 21315 22083 22858 23030 23104 23137 23183 23239 23326 23519
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 19560
encaps cycles = 21618
decaps cycles = 29442
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 19093 19156 19213 19281 19406 19560 19699 19776 19869 20036 20884
encaps percentiles: 21477 21526 21557 21580 21597 21618 21641 21668 21745 22413 22823
decaps percentiles: 29249 29321 29363 29391 29418 29442 29467 29507 29548 30124 30492
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 26036
encaps cycles = 31046
decaps cycles = 41734
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25852 25908 25946 25974 26015 26036 26064 26098 26150 26797 28057
encaps percentiles: 29883 30468 30796 30963 31012 31046 31073 31103 31144 31251 31846
decaps percentiles: 40696 41413 41485 41548 41628 41734 41909 42050 42176 42353 43250
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 25602
encaps cycles = 31490
decaps cycles = 40917
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25079 25286 25393 25460 25516 25602 25690 25764 25848 26044 28043
encaps percentiles: 30802 31101 31218 31320 31403 31490 31586 31679 31805 31976 33611
decaps percentiles: 40170 40478 40612 40739 40839 40917 41013 41092 41214 41430 43156
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 44000
encaps cycles = 51980
decaps cycles = 65976
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 43000 43456 43640 43763 43900 44000 44122 44259 44405 44816 46827
encaps percentiles: 51139 51510 51674 51782 51887 51980 52091 52206 52391 52819 54201
decaps percentiles: 65081 65476 65638 65735 65839 65976 66127 66255 66422 66793 68442
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 65377
encaps cycles = 76315
decaps cycles = 96838
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 64363 64806 64960 65095 65258 65377 65477 65653 65896 67175 69122
encaps percentiles: 75266 75688 75905 76033 76158 76315 76440 76625 76889 78088 79770
decaps percentiles: 95725 96099 96358 96551 96670 96838 97024 97201 97477 98530 99878
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 9412
encaps cycles = 13317
decaps cycles = 18035
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 9205 9257 9287 9319 9361 9412 9460 9505 9563 9771 10417
encaps percentiles: 13102 13151 13186 13224 13257 13317 13379 13445 13521 13634 14007
decaps percentiles: 17675 17781 17875 17940 17992 18035 18088 18166 18255 18493 19746
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 16341
encaps cycles = 18289
decaps cycles = 24886
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 16095 16175 16208 16249 16294 16341 16399 16484 16635 16923 17459
encaps percentiles: 18066 18119 18158 18189 18230 18289 18360 18410 18461 18643 19127
decaps percentiles: 24590 24673 24724 24769 24834 24886 24926 24964 25009 25288 25737
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 22182
encaps cycles = 25456
decaps cycles = 34674
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 21815 21937 21992 22061 22132 22182 22235 22295 22372 22699 23285
encaps percentiles: 25154 25231 25289 25361 25415 25456 25484 25529 25582 25991 26617
decaps percentiles: 34246 34371 34463 34557 34613 34674 34713 34788 34887 35277 36129
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 30138
encaps cycles = 37001
decaps cycles = 48407
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 29577 29801 29902 29985 30059 30138 30221 30343 30469 30891 32987
encaps percentiles: 36340 36596 36734 36828 36923 37001 37070 37152 37275 37455 39863
decaps percentiles: 47718 47978 48139 48246 48323 48407 48501 48580 48709 48886 50878
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 51842
encaps cycles = 61086
decaps cycles = 77840
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 50898 51246 51424 51601 51721 51842 51960 52105 52327 52754 54764
encaps percentiles: 60252 60597 60771 60879 61003 61086 61186 61351 61526 61841 63710
decaps percentiles: 77047 77368 77531 77642 77744 77840 77954 78075 78236 78671 80447
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 76986
encaps cycles = 89866
decaps cycles = 113576
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 76177 76517 76669 76781 76875 76986 77114 77243 77478 79337 82026
encaps percentiles: 88896 89337 89501 89617 89731 89866 89985 90115 90401 92024 94605
decaps percentiles: 112289 112946 113115 113292 113420 113576 113718 113861 114104 115719 118236
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 11060
encaps cycles = 15711
decaps cycles = 23103
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 10940 10986 11007 11027 11042 11060 11082 11104 11143 11808 12922
encaps percentiles: 15589 15641 15662 15680 15695 15711 15735 15766 15818 15912 17149
decaps percentiles: 21064 21813 22677 22996 23068 23103 23139 23169 23224 23298 23498
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 19650
encaps cycles = 21712
decaps cycles = 29439
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 19192 19253 19298 19355 19477 19650 19775 19877 20006 20229 20779
encaps percentiles: 21560 21623 21651 21673 21693 21712 21730 21756 21819 22529 22937
decaps percentiles: 29254 29333 29367 29398 29421 29439 29455 29481 29524 30064 30543
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 26009
encaps cycles = 31036
decaps cycles = 41680
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25835 25887 25919 25944 25977 26009 26041 26080 26191 26742 27920
encaps percentiles: 29883 30443 30763 30947 31007 31036 31056 31078 31111 31299 31892
decaps percentiles: 40681 41394 41456 41510 41567 41680 41821 42002 42146 42344 43062
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 30182
encaps cycles = 36950
decaps cycles = 48262
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 29644 29832 29928 30013 30095 30182 30266 30381 30533 30805 32946
encaps percentiles: 36355 36594 36706 36802 36866 36950 37023 37105 37229 37403 39877
decaps percentiles: 47561 47881 48014 48102 48181 48262 48342 48416 48538 48757 50876
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 51856
encaps cycles = 61116
decaps cycles = 77642
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 50821 51227 51432 51602 51717 51856 51995 52139 52367 52892 55058
encaps percentiles: 60366 60665 60824 60923 61026 61116 61216 61326 61466 61778 63607
decaps percentiles: 76844 77188 77349 77452 77525 77642 77763 77898 78089 78560 80331
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 77012
encaps cycles = 90031
decaps cycles = 113730
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 76202 76520 76672 76785 76884 77012 77126 77253 77479 79426 82237
encaps percentiles: 88975 89427 89588 89765 89906 90031 90159 90288 90539 92187 94408
decaps percentiles: 112733 113113 113328 113467 113613 113730 113870 114073 114305 115928 117329
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 11068
encaps cycles = 15740
decaps cycles = 22999
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 10943 10987 11017 11035 11051 11068 11087 11112 11176 11897 12834
encaps percentiles: 15597 15658 15679 15695 15712 15740 15782 15824 15891 16262 17190
decaps percentiles: 21104 21478 22041 22733 22934 22999 23037 23084 23142 23232 23450
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 19527
encaps cycles = 21607
decaps cycles = 29407
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 19042 19116 19168 19227 19343 19527 19687 19795 19928 20111 20850
encaps percentiles: 21442 21527 21548 21568 21589 21607 21626 21653 21796 22389 22794
decaps percentiles: 29236 29315 29348 29370 29391 29407 29430 29463 29513 30075 30433
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 25847
encaps cycles = 30589
decaps cycles = 41184
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25208 25360 25467 25536 25620 25847 25950 26007 26098 26475 28041
encaps percentiles: 29296 29543 30184 30377 30474 30589 30759 30927 30985 31054 31682
decaps percentiles: 39769 40094 40587 40736 40876 41184 41385 41498 41775 42074 42829
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 25544
encaps cycles = 31876
decaps cycles = 42131
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25062 25236 25315 25394 25473 25544 25618 25682 25777 25926 27851
encaps percentiles: 31175 31441 31569 31699 31795 31876 31956 32102 32199 32376 34200
decaps percentiles: 41310 41653 41785 41915 42022 42131 42231 42346 42482 42654 44399
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 44778
encaps cycles = 52425
decaps cycles = 67387
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 43752 44184 44370 44548 44666 44778 44921 45066 45254 45795 47209
encaps percentiles: 51722 51976 52111 52242 52344 52425 52517 52659 52840 53195 54294
decaps percentiles: 66436 66838 67037 67168 67284 67387 67514 67681 67888 68327 69744
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 67494
encaps cycles = 77511
decaps cycles = 97775
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 65964 66660 66935 67111 67325 67494 67686 67951 68233 68870 70712
encaps percentiles: 76417 76870 77086 77229 77363 77511 77655 77809 78050 78973 80377
decaps percentiles: 96560 97106 97341 97506 97645 97775 97962 98156 98440 99486 100991
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 9279
encaps cycles = 13139
decaps cycles = 18002
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 9172 9208 9228 9242 9260 9279 9302 9339 9379 9431 9922
encaps percentiles: 13066 13095 13109 13122 13131 13139 13151 13177 13220 13303 13805
decaps percentiles: 17579 17783 17842 17889 17934 18002 18060 18092 18129 18215 18665
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 16136
encaps cycles = 18190
decaps cycles = 24569
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 16003 16043 16081 16102 16120 16136 16154 16174 16195 16354 16829
encaps percentiles: 18107 18137 18153 18164 18175 18190 18205 18220 18246 18424 18897
decaps percentiles: 24431 24474 24500 24522 24540 24569 24602 24640 24713 24899 25318
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 21865
encaps cycles = 25073
decaps cycles = 34310
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 21733 21790 21811 21832 21849 21865 21887 21908 21951 22487 23126
encaps percentiles: 24949 24999 25023 25041 25057 25073 25091 25115 25148 25687 25882
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 25517
encaps cycles = 31850
decaps cycles = 42098
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25027 25268 25335 25413 25470 25517 25581 25651 25749 25891 27939
encaps percentiles: 31204 31434 31591 31692 31780 31850 31934 32034 32131 32334 34031
decaps percentiles: 41348 41644 41794 41897 42003 42098 42199 42342 42493 42734 44438
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 44832
encaps cycles = 52463
decaps cycles = 67362
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 43816 44184 44403 44522 44682 44832 44961 45119 45355 45884 47006
encaps percentiles: 51671 51943 52135 52243 52348 52463 52569 52683 52846 53290 54547
decaps percentiles: 66255 66863 67021 67155 67257 67362 67524 67693 67864 68280 69972
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 67406
encaps cycles = 77514
decaps cycles = 97736
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 65929 66533 66784 67033 67219 67406 67610 67785 68099 68534 70019
encaps percentiles: 76428 76887 77093 77278 77390 77514 77650 77832 78076 78837 80412
decaps percentiles: 96590 97034 97292 97470 97625 97736 97873 98063 98401 99321 100872
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 9347
encaps cycles = 13136
decaps cycles = 17878
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 9262 9289 9300 9314 9327 9347 9377 9419 9466 9514 10071
encaps percentiles: 13062 13097 13108 13117 13128 13136 13147 13170 13216 13278 13816
decaps percentiles: 17628 17698 17758 17828 17854 17878 17897 17919 17955 18041 18460
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 16103
encaps cycles = 18033
decaps cycles = 24565
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 15969 16017 16049 16071 16089 16103 16117 16132 16155 16263 16805
encaps percentiles: 17956 17986 18001 18014 18022 18033 18045 18058 18079 18188 18756
decaps percentiles: 24453 24495 24522 24540 24552 24565 24588 24626 24697 24899 25443
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 21833
encaps cycles = 25082
decaps cycles = 34359
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 21696 21749 21780 21798 21815 21833 21854 21879 21913 22427 23071
encaps percentiles: 24960 25010 25030 25051 25063 25082 25104 25123 25153 25687 26040
decaps percentiles: 34132 34222 34268 34301 34332 34359 34387 34420 34479 34896 35310
ubuntu@ip-172-31-3-107:~/mlkem-native$ ./scripts/tests bench -c PMU --cflags="-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto"
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 25473
encaps cycles = 31806
decaps cycles = 42095
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25080 25213 25286 25355 25418 25473 25542 25629 25717 25850 27780
encaps percentiles: 31048 31377 31509 31600 31712 31806 31908 32005 32110 32278 33910
decaps percentiles: 41263 41546 41778 41895 42000 42095 42208 42301 42428 42725 44496
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 44788
encaps cycles = 52447
decaps cycles = 67421
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 43699 44184 44352 44535 44668 44788 44940 45087 45287 45792 46955
encaps percentiles: 51654 51982 52113 52246 52355 52447 52555 52673 52831 53220 54492
decaps percentiles: 66497 66920 67068 67214 67308 67421 67533 67653 67836 68299 70112
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 67260
encaps cycles = 77443
decaps cycles = 97734
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 65885 66475 66761 66974 67105 67260 67435 67675 67957 68649 70269
encaps percentiles: 76130 76783 77003 77154 77310 77443 77623 77820 78065 78817 80469
decaps percentiles: 96435 97010 97281 97441 97599 97734 97892 98064 98390 99227 101106
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 9311
encaps cycles = 13114
decaps cycles = 17842
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 9228 9253 9269 9282 9294 9311 9341 9379 9419 9480 9981
encaps percentiles: 13040 13068 13081 13091 13103 13114 13129 13150 13212 13277 13766
decaps percentiles: 17590 17668 17734 17792 17823 17842 17865 17891 17933 18044 18340
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 16125
encaps cycles = 18102
decaps cycles = 24600
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 15972 16030 16063 16086 16106 16125 16143 16169 16192 16306 16820
encaps percentiles: 18004 18047 18061 18074 18087 18102 18121 18148 18175 18273 18835
decaps percentiles: 24470 24508 24526 24548 24567 24600 24667 24711 24752 24945 25419
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 21995
encaps cycles = 25042
decaps cycles = 34213
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 21852 21922 21944 21964 21976 21995 22010 22039 22088 22594 23294
encaps percentiles: 24931 24980 24994 25008 25024 25042 25059 25079 25112 25652 25875
decaps percentiles: 34050 34114 34139 34165 34183 34213 34242 34274 34333 34803 35120
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 25528
encaps cycles = 31838
decaps cycles = 42054
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25063 25211 25309 25393 25466 25528 25588 25660 25724 25895 27868
encaps percentiles: 31110 31391 31549 31643 31746 31838 31950 32033 32171 32365 33740
decaps percentiles: 41211 41533 41698 41834 41959 42054 42162 42289 42405 42732 44064
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 44771
encaps cycles = 52462
decaps cycles = 67505
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 43799 44194 44373 44529 44662 44771 44910 45079 45333 45798 47167
encaps percentiles: 51654 51971 52142 52250 52372 52462 52566 52692 52880 53239 54556
decaps percentiles: 66562 66953 67135 67284 67395 67505 67638 67772 67992 68496 70032
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 67344
encaps cycles = 77456
decaps cycles = 97752
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 65832 66502 66753 66987 67183 67344 67520 67723 68043 68606 70291
encaps percentiles: 76420 76813 77027 77193 77373 77456 77618 77781 78059 79033 80517
decaps percentiles: 96407 97021 97267 97457 97627 97752 97878 98023 98315 99250 100999
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 9342
encaps cycles = 13136
decaps cycles = 17874
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 9257 9284 9296 9309 9325 9342 9380 9418 9456 9518 10017
encaps percentiles: 13059 13087 13101 13111 13125 13136 13149 13165 13204 13267 13806
decaps percentiles: 17600 17696 17775 17823 17850 17874 17893 17911 17951 18027 18456
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 16166
encaps cycles = 18175
decaps cycles = 24730
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 16007 16084 16108 16135 16152 16166 16182 16199 16218 16311 16877
encaps percentiles: 18031 18109 18131 18149 18159 18175 18187 18206 18222 18351 18876
decaps percentiles: 24466 24623 24661 24684 24707 24730 24760 24799 24861 25012 25535
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 21881
encaps cycles = 25301
decaps cycles = 34390
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 21710 21776 21803 21827 21853 21881 21907 21931 21967 22469 23111
encaps percentiles: 25164 25226 25251 25270 25287 25301 25323 25343 25389 25911 26060
decaps percentiles: 34090 34207 34277 34318 34356 34390 34420 34451 34508 34970 35397
INFO > Benchmark Compile (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO > Benchmark ML-KEM-512 (native, no_opt)
keypair cycles = 25504
encaps cycles = 31843
decaps cycles = 42080
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 25008 25223 25313 25390 25452 25504 25563 25636 25717 25866 27813
encaps percentiles: 31093 31417 31549 31624 31760 31843 31961 32067 32187 32341 34082
decaps percentiles: 41185 41569 41717 41848 41955 42080 42192 42326 42506 42808 43976
INFO > Benchmark ML-KEM-768 (native, no_opt)
keypair cycles = 44793
encaps cycles = 52424
decaps cycles = 67443
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 43673 44181 44400 44544 44666 44793 44926 45081 45295 45751 47216
encaps percentiles: 51600 51978 52146 52237 52317 52424 52515 52641 52769 53113 54465
decaps percentiles: 66441 66892 67083 67203 67337 67443 67571 67708 67858 68302 69802
INFO > Benchmark ML-KEM-1024 (native, no_opt)
keypair cycles = 67355
encaps cycles = 77356
decaps cycles = 97552
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 66000 66561 66833 67014 67193 67355 67576 67812 68172 68771 70545
encaps percentiles: 76249 76729 76921 77069 77227 77356 77493 77658 77880 78871 80308
decaps percentiles: 96496 96877 97076 97267 97404 97552 97722 97877 98145 99153 101011
INFO > Benchmark Compile (native, opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO > Benchmark ML-KEM-512 (native, opt)
keypair cycles = 9272
encaps cycles = 13127
decaps cycles = 17910
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 9174 9202 9223 9238 9254 9272 9305 9335 9388 9426 10063
encaps percentiles: 13040 13077 13091 13104 13114 13127 13141 13162 13202 13278 13781
decaps percentiles: 17629 17723 17804 17860 17888 17910 17934 17968 18008 18103 18537
INFO > Benchmark ML-KEM-768 (native, opt)
keypair cycles = 16161
encaps cycles = 18105
decaps cycles = 24648
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 16005 16075 16105 16130 16146 16161 16174 16194 16218 16304 16881
encaps percentiles: 18011 18053 18068 18082 18093 18105 18121 18142 18170 18349 18871
decaps percentiles: 24463 24530 24568 24594 24617 24648 24690 24723 24794 25059 25471
INFO > Benchmark ML-KEM-1024 (native, opt)
keypair cycles = 21877
encaps cycles = 25078
decaps cycles = 34464
percentile 1 10 20 30 40 50 60 70 80 90 99
keypair percentiles: 21709 21766 21808 21829 21853 21877 21902 21923 21976 22449 23114
encaps percentiles: 24960 25003 25026 25050 25063 25078 25096 25116 25144 25701 25840
decaps percentiles: 34247 34324 34379 34410 34438 34464 34500 34543 34596 35049 35397
We see a lot of variance in the benchmarks on x86_x64, especially for the C backend on Intel instances: https://pq-code-package.github.io/mlkem-native/dev/bench/
This needs to be resolved soon as it hinders warning about performance regressions as proposed in #202 / #424 .