pq-code-package / mlkem-native

High-assurance, high-performance ML-KEM implementation for mobile, pc, and server targets
https://pq-code-package.github.io/mlkem-native/dev/bench/
Apache License 2.0
11 stars 9 forks source link

Unstable benchmarking in EC2 on x86_64 #429

Open mkannwischer opened 2 days ago

mkannwischer commented 2 days ago

We see a lot of variance in the benchmarks on x86_x64, especially for the C backend on Intel instances: https://pq-code-package.github.io/mlkem-native/dev/bench/ image

This needs to be resolved soon as it hinders warning about performance regressions as proposed in #202 / #424 .

mkannwischer commented 2 days ago

I've performed my own benchmarks on c7i using

 ./scripts/tests bench -c PMU  --cflags="-mavx2 -mbmi2 -mpopcnt -maes -flto"

Here are my measurements for keypair cycles of mlkem-512 with the C backend [32085, 32225, 32194, 32499, 32580, 32543, 32535, 32576, 32715, 32581].

max is 32715 min is 32085

That's within 2% of each other which looks okay. But in #424, we saw differences of 10% and more frequently.

mkannwischer commented 2 days ago

I've tried perf (replaced PERF_COUNT_HW_CPU_CYCLES with PERF_COUNT_HW_REF_CPU_CYCLES):

 ./scripts/tests bench -c PERF  --cflags="-mavx2 -mbmi2 -mpopcnt -maes -flto"  -r

[32273, 32087, 32370, 32130, 32079, 32328, 32480, 32093, 32408, 32482]

max is 32482 min is 32079

So that seems to be a viable alternative (slighly less variance here, but that's likely luck).

One theory is that the cycle increase if the hypervisor is interrupting the VM messing with the rdtsc benchmarks. This could be mitigated by using perf with PERF_COUNT_HW_REF_CPU_CYCLES. But as I do not see the variance right now, this is hard to test for me now.

mkannwischer commented 2 days ago

I've run more experiments in CI:

I'm comparing the performance we are seeing on #424 (with PMU) vs. #430 (PERF w/ PERF_COUNT_HW_REF_CPU_CYCLES) on c7i and restarted the benchmarking the CI multiple times:

Here are my measurements for keypair cycles of mlkem-512 with the C backend PMU: [34765,34953,36790,34479,34632] PERF: [34542, 32884, 35137,34554, 34647]

Seems equally bad :( and different :( x86 makes me sad.

The runs are here: PMU: https://github.com/pq-code-package/mlkem-native/actions/runs/11909823923/job/33237075854?pr=424 PERF: https://github.com/pq-code-package/mlkem-native/actions/runs/11924957674/job/33236975067?pr=430

mkannwischer commented 1 day ago

I've performed some more benchmarking on different instance types. I launched 5 c7i.large instances and 5 c7i.metal-24xl instances. Here are the results with gcc 13.2.0 The command I am running is ./scripts/tests bench -c PMU --cflags="-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto"

TL;DR: the benchmarks are stable on c7i.metal-24xl, so the variance indeed seems to be due to virtualization. Unfortunately, this is bad news for us as the metal boxes are too pricey to run CI on it.

c7i.large

INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 29180
    encaps cycles = 35883
    decaps cycles = 46855

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  28062  28568  28778  28900  29042  29180  29341  29474  29633  29851  31793
    encaps percentiles:  34523  35155  35432  35605  35749  35883  36010  36164  36310  36561  38702
    decaps percentiles:  45009  45990  46241  46441  46683  46855  47022  47152  47335  47614  49568

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 50437
    encaps cycles = 59270
    decaps cycles = 75476

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  48572  49540  49860  50056  50258  50437  50636  50812  51066  51628  53697
    encaps percentiles:  57658  58319  58622  58910  59110  59270  59432  59599  59883  60303  61784
    decaps percentiles:  73540  74444  74794  75052  75246  75476  75675  75871  76101  76638  78485

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 74709
    encaps cycles = 87368
    decaps cycles = 110252

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  72708  73585  74012  74282  74549  74709  74953  75194  75543  76667  79245
    encaps percentiles:  85314  86227  86683  86924  87174  87368  87603  87885  88272  89040  91139
    decaps percentiles: 107911 108946 109472 109773 109995 110252 110559 110796 111234 111993 114092

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 10842
    encaps cycles = 15347
    decaps cycles = 20985

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  10390  10625  10676  10724  10780  10842  10909  10995  11143  11549  12589
    encaps percentiles:  14843  15139  15203  15255  15296  15347  15405  15493  15564  15661  16355
    decaps percentiles:  19803  20421  20593  20723  20853  20985  21110  21255  21473  22049  22833

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 18850
    encaps cycles = 21061
    decaps cycles = 28488

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  18010  18454  18613  18702  18782  18850  18933  19033  19186  19467  20221
    encaps percentiles:  20006  20615  20822  20946  21003  21061  21139  21229  21313  21437  22128
    decaps percentiles:  27454  27997  28193  28344  28434  28488  28584  28693  28778  28955  29679

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 25269
    encaps cycles = 29265
    decaps cycles = 39824

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  24343  24807  24980  25104  25199  25269  25392  25481  25601  25823  26809
    encaps percentiles:  28304  28720  28912  29050  29188  29265  29351  29458  29639  30002  30809
    decaps percentiles:  38493  39118  39337  39530  39692  39824  39978  40102  40255  40538  41623
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 30133
    encaps cycles = 37254
    decaps cycles = 48576

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  29601  29792  29913  29996  30050  30133  30219  30357  30487  30798  32956
    encaps percentiles:  36581  36864  36985  37075  37165  37254  37350  37432  37540  37725  39978
    decaps percentiles:  47884  48158  48289  48374  48470  48576  48666  48801  48921  49109  51028

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 51937
    encaps cycles = 61048
    decaps cycles = 77828

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  50970  51317  51536  51664  51800  51937  52043  52182  52387  52843  55128
    encaps percentiles:  60324  60609  60742  60850  60952  61048  61131  61233  61383  61756  63631
    decaps percentiles:  76944  77330  77486  77602  77715  77828  77960  78057  78217  78579  80506

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 76954
    encaps cycles = 89878
    decaps cycles = 113724

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  76163  76454  76604  76737  76852  76954  77085  77216  77484  79416  82060
    encaps percentiles:  88991  89347  89536  89629  89766  89878  89989  90168  90375  92217  95286
    decaps percentiles: 112601 113102 113282 113449 113603 113724 113903 114047 114265 115853 119377

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 11067
    encaps cycles = 15730
    decaps cycles = 23104

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  10937  10983  11008  11027  11046  11067  11090  11112  11158  11824  12871
    encaps percentiles:  15598  15638  15664  15690  15705  15730  15760  15814  15910  16391  17358
    decaps percentiles:  21033  21315  22083  22858  23030  23104  23137  23183  23239  23326  23519

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 19560
    encaps cycles = 21618
    decaps cycles = 29442

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  19093  19156  19213  19281  19406  19560  19699  19776  19869  20036  20884
    encaps percentiles:  21477  21526  21557  21580  21597  21618  21641  21668  21745  22413  22823
    decaps percentiles:  29249  29321  29363  29391  29418  29442  29467  29507  29548  30124  30492

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 26036
    encaps cycles = 31046
    decaps cycles = 41734

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25852  25908  25946  25974  26015  26036  26064  26098  26150  26797  28057
    encaps percentiles:  29883  30468  30796  30963  31012  31046  31073  31103  31144  31251  31846
    decaps percentiles:  40696  41413  41485  41548  41628  41734  41909  42050  42176  42353  43250
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 25602
    encaps cycles = 31490
    decaps cycles = 40917

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25079  25286  25393  25460  25516  25602  25690  25764  25848  26044  28043
    encaps percentiles:  30802  31101  31218  31320  31403  31490  31586  31679  31805  31976  33611
    decaps percentiles:  40170  40478  40612  40739  40839  40917  41013  41092  41214  41430  43156

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 44000
    encaps cycles = 51980
    decaps cycles = 65976

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  43000  43456  43640  43763  43900  44000  44122  44259  44405  44816  46827
    encaps percentiles:  51139  51510  51674  51782  51887  51980  52091  52206  52391  52819  54201
    decaps percentiles:  65081  65476  65638  65735  65839  65976  66127  66255  66422  66793  68442

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 65377
    encaps cycles = 76315
    decaps cycles = 96838

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  64363  64806  64960  65095  65258  65377  65477  65653  65896  67175  69122
    encaps percentiles:  75266  75688  75905  76033  76158  76315  76440  76625  76889  78088  79770
    decaps percentiles:  95725  96099  96358  96551  96670  96838  97024  97201  97477  98530  99878

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 9412
    encaps cycles = 13317
    decaps cycles = 18035

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:   9205   9257   9287   9319   9361   9412   9460   9505   9563   9771  10417
    encaps percentiles:  13102  13151  13186  13224  13257  13317  13379  13445  13521  13634  14007
    decaps percentiles:  17675  17781  17875  17940  17992  18035  18088  18166  18255  18493  19746

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 16341
    encaps cycles = 18289
    decaps cycles = 24886

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  16095  16175  16208  16249  16294  16341  16399  16484  16635  16923  17459
    encaps percentiles:  18066  18119  18158  18189  18230  18289  18360  18410  18461  18643  19127
    decaps percentiles:  24590  24673  24724  24769  24834  24886  24926  24964  25009  25288  25737

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 22182
    encaps cycles = 25456
    decaps cycles = 34674

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  21815  21937  21992  22061  22132  22182  22235  22295  22372  22699  23285
    encaps percentiles:  25154  25231  25289  25361  25415  25456  25484  25529  25582  25991  26617
    decaps percentiles:  34246  34371  34463  34557  34613  34674  34713  34788  34887  35277  36129
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 30138
    encaps cycles = 37001
    decaps cycles = 48407

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  29577  29801  29902  29985  30059  30138  30221  30343  30469  30891  32987
    encaps percentiles:  36340  36596  36734  36828  36923  37001  37070  37152  37275  37455  39863
    decaps percentiles:  47718  47978  48139  48246  48323  48407  48501  48580  48709  48886  50878

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 51842
    encaps cycles = 61086
    decaps cycles = 77840

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  50898  51246  51424  51601  51721  51842  51960  52105  52327  52754  54764
    encaps percentiles:  60252  60597  60771  60879  61003  61086  61186  61351  61526  61841  63710
    decaps percentiles:  77047  77368  77531  77642  77744  77840  77954  78075  78236  78671  80447

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 76986
    encaps cycles = 89866
    decaps cycles = 113576

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  76177  76517  76669  76781  76875  76986  77114  77243  77478  79337  82026
    encaps percentiles:  88896  89337  89501  89617  89731  89866  89985  90115  90401  92024  94605
    decaps percentiles: 112289 112946 113115 113292 113420 113576 113718 113861 114104 115719 118236

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 11060
    encaps cycles = 15711
    decaps cycles = 23103

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  10940  10986  11007  11027  11042  11060  11082  11104  11143  11808  12922
    encaps percentiles:  15589  15641  15662  15680  15695  15711  15735  15766  15818  15912  17149
    decaps percentiles:  21064  21813  22677  22996  23068  23103  23139  23169  23224  23298  23498

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 19650
    encaps cycles = 21712
    decaps cycles = 29439

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  19192  19253  19298  19355  19477  19650  19775  19877  20006  20229  20779
    encaps percentiles:  21560  21623  21651  21673  21693  21712  21730  21756  21819  22529  22937
    decaps percentiles:  29254  29333  29367  29398  29421  29439  29455  29481  29524  30064  30543

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 26009
    encaps cycles = 31036
    decaps cycles = 41680

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25835  25887  25919  25944  25977  26009  26041  26080  26191  26742  27920
    encaps percentiles:  29883  30443  30763  30947  31007  31036  31056  31078  31111  31299  31892
    decaps percentiles:  40681  41394  41456  41510  41567  41680  41821  42002  42146  42344  43062
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 30182
    encaps cycles = 36950
    decaps cycles = 48262

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  29644  29832  29928  30013  30095  30182  30266  30381  30533  30805  32946
    encaps percentiles:  36355  36594  36706  36802  36866  36950  37023  37105  37229  37403  39877
    decaps percentiles:  47561  47881  48014  48102  48181  48262  48342  48416  48538  48757  50876

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 51856
    encaps cycles = 61116
    decaps cycles = 77642

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  50821  51227  51432  51602  51717  51856  51995  52139  52367  52892  55058
    encaps percentiles:  60366  60665  60824  60923  61026  61116  61216  61326  61466  61778  63607
    decaps percentiles:  76844  77188  77349  77452  77525  77642  77763  77898  78089  78560  80331

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 77012
    encaps cycles = 90031
    decaps cycles = 113730

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  76202  76520  76672  76785  76884  77012  77126  77253  77479  79426  82237
    encaps percentiles:  88975  89427  89588  89765  89906  90031  90159  90288  90539  92187  94408
    decaps percentiles: 112733 113113 113328 113467 113613 113730 113870 114073 114305 115928 117329

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 11068
    encaps cycles = 15740
    decaps cycles = 22999

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  10943  10987  11017  11035  11051  11068  11087  11112  11176  11897  12834
    encaps percentiles:  15597  15658  15679  15695  15712  15740  15782  15824  15891  16262  17190
    decaps percentiles:  21104  21478  22041  22733  22934  22999  23037  23084  23142  23232  23450

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 19527
    encaps cycles = 21607
    decaps cycles = 29407

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  19042  19116  19168  19227  19343  19527  19687  19795  19928  20111  20850
    encaps percentiles:  21442  21527  21548  21568  21589  21607  21626  21653  21796  22389  22794
    decaps percentiles:  29236  29315  29348  29370  29391  29407  29430  29463  29513  30075  30433

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 25847
    encaps cycles = 30589
    decaps cycles = 41184

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25208  25360  25467  25536  25620  25847  25950  26007  26098  26475  28041
    encaps percentiles:  29296  29543  30184  30377  30474  30589  30759  30927  30985  31054  31682
    decaps percentiles:  39769  40094  40587  40736  40876  41184  41385  41498  41775  42074  42829

c7i.metal-24xl

INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 25544
    encaps cycles = 31876
    decaps cycles = 42131

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25062  25236  25315  25394  25473  25544  25618  25682  25777  25926  27851
    encaps percentiles:  31175  31441  31569  31699  31795  31876  31956  32102  32199  32376  34200
    decaps percentiles:  41310  41653  41785  41915  42022  42131  42231  42346  42482  42654  44399

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 44778
    encaps cycles = 52425
    decaps cycles = 67387

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  43752  44184  44370  44548  44666  44778  44921  45066  45254  45795  47209
    encaps percentiles:  51722  51976  52111  52242  52344  52425  52517  52659  52840  53195  54294
    decaps percentiles:  66436  66838  67037  67168  67284  67387  67514  67681  67888  68327  69744

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 67494
    encaps cycles = 77511
    decaps cycles = 97775

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  65964  66660  66935  67111  67325  67494  67686  67951  68233  68870  70712
    encaps percentiles:  76417  76870  77086  77229  77363  77511  77655  77809  78050  78973  80377
    decaps percentiles:  96560  97106  97341  97506  97645  97775  97962  98156  98440  99486 100991

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 9279
    encaps cycles = 13139
    decaps cycles = 18002

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:   9172   9208   9228   9242   9260   9279   9302   9339   9379   9431   9922
    encaps percentiles:  13066  13095  13109  13122  13131  13139  13151  13177  13220  13303  13805
    decaps percentiles:  17579  17783  17842  17889  17934  18002  18060  18092  18129  18215  18665

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 16136
    encaps cycles = 18190
    decaps cycles = 24569

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  16003  16043  16081  16102  16120  16136  16154  16174  16195  16354  16829
    encaps percentiles:  18107  18137  18153  18164  18175  18190  18205  18220  18246  18424  18897
    decaps percentiles:  24431  24474  24500  24522  24540  24569  24602  24640  24713  24899  25318

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 21865
    encaps cycles = 25073
    decaps cycles = 34310

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  21733  21790  21811  21832  21849  21865  21887  21908  21951  22487  23126
    encaps percentiles:  24949  24999  25023  25041  25057  25073  25091  25115  25148  25687  25882
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 25517
    encaps cycles = 31850
    decaps cycles = 42098

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25027  25268  25335  25413  25470  25517  25581  25651  25749  25891  27939
    encaps percentiles:  31204  31434  31591  31692  31780  31850  31934  32034  32131  32334  34031
    decaps percentiles:  41348  41644  41794  41897  42003  42098  42199  42342  42493  42734  44438

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 44832
    encaps cycles = 52463
    decaps cycles = 67362

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  43816  44184  44403  44522  44682  44832  44961  45119  45355  45884  47006
    encaps percentiles:  51671  51943  52135  52243  52348  52463  52569  52683  52846  53290  54547
    decaps percentiles:  66255  66863  67021  67155  67257  67362  67524  67693  67864  68280  69972

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 67406
    encaps cycles = 77514
    decaps cycles = 97736

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  65929  66533  66784  67033  67219  67406  67610  67785  68099  68534  70019
    encaps percentiles:  76428  76887  77093  77278  77390  77514  77650  77832  78076  78837  80412
    decaps percentiles:  96590  97034  97292  97470  97625  97736  97873  98063  98401  99321 100872

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 9347
    encaps cycles = 13136
    decaps cycles = 17878

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:   9262   9289   9300   9314   9327   9347   9377   9419   9466   9514  10071
    encaps percentiles:  13062  13097  13108  13117  13128  13136  13147  13170  13216  13278  13816
    decaps percentiles:  17628  17698  17758  17828  17854  17878  17897  17919  17955  18041  18460

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 16103
    encaps cycles = 18033
    decaps cycles = 24565

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  15969  16017  16049  16071  16089  16103  16117  16132  16155  16263  16805
    encaps percentiles:  17956  17986  18001  18014  18022  18033  18045  18058  18079  18188  18756
    decaps percentiles:  24453  24495  24522  24540  24552  24565  24588  24626  24697  24899  25443

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 21833
    encaps cycles = 25082
    decaps cycles = 34359

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  21696  21749  21780  21798  21815  21833  21854  21879  21913  22427  23071
    encaps percentiles:  24960  25010  25030  25051  25063  25082  25104  25123  25153  25687  26040
    decaps percentiles:  34132  34222  34268  34301  34332  34359  34387  34420  34479  34896  35310
ubuntu@ip-172-31-3-107:~/mlkem-native$ ./scripts/tests bench -c PMU  --cflags="-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto"
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 25473
    encaps cycles = 31806
    decaps cycles = 42095

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25080  25213  25286  25355  25418  25473  25542  25629  25717  25850  27780
    encaps percentiles:  31048  31377  31509  31600  31712  31806  31908  32005  32110  32278  33910
    decaps percentiles:  41263  41546  41778  41895  42000  42095  42208  42301  42428  42725  44496

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 44788
    encaps cycles = 52447
    decaps cycles = 67421

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  43699  44184  44352  44535  44668  44788  44940  45087  45287  45792  46955
    encaps percentiles:  51654  51982  52113  52246  52355  52447  52555  52673  52831  53220  54492
    decaps percentiles:  66497  66920  67068  67214  67308  67421  67533  67653  67836  68299  70112

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 67260
    encaps cycles = 77443
    decaps cycles = 97734

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  65885  66475  66761  66974  67105  67260  67435  67675  67957  68649  70269
    encaps percentiles:  76130  76783  77003  77154  77310  77443  77623  77820  78065  78817  80469
    decaps percentiles:  96435  97010  97281  97441  97599  97734  97892  98064  98390  99227 101106

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 9311
    encaps cycles = 13114
    decaps cycles = 17842

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:   9228   9253   9269   9282   9294   9311   9341   9379   9419   9480   9981
    encaps percentiles:  13040  13068  13081  13091  13103  13114  13129  13150  13212  13277  13766
    decaps percentiles:  17590  17668  17734  17792  17823  17842  17865  17891  17933  18044  18340

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 16125
    encaps cycles = 18102
    decaps cycles = 24600

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  15972  16030  16063  16086  16106  16125  16143  16169  16192  16306  16820
    encaps percentiles:  18004  18047  18061  18074  18087  18102  18121  18148  18175  18273  18835
    decaps percentiles:  24470  24508  24526  24548  24567  24600  24667  24711  24752  24945  25419

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 21995
    encaps cycles = 25042
    decaps cycles = 34213

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  21852  21922  21944  21964  21976  21995  22010  22039  22088  22594  23294
    encaps percentiles:  24931  24980  24994  25008  25024  25042  25059  25079  25112  25652  25875
    decaps percentiles:  34050  34114  34139  34165  34183  34213  34242  34274  34333  34803  35120
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=0
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 25528
    encaps cycles = 31838
    decaps cycles = 42054

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25063  25211  25309  25393  25466  25528  25588  25660  25724  25895  27868
    encaps percentiles:  31110  31391  31549  31643  31746  31838  31950  32033  32171  32365  33740
    decaps percentiles:  41211  41533  41698  41834  41959  42054  42162  42289  42405  42732  44064

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 44771
    encaps cycles = 52462
    decaps cycles = 67505

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  43799  44194  44373  44529  44662  44771  44910  45079  45333  45798  47167
    encaps percentiles:  51654  51971  52142  52250  52372  52462  52566  52692  52880  53239  54556
    decaps percentiles:  66562  66953  67135  67284  67395  67505  67638  67772  67992  68496  70032

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 67344
    encaps cycles = 77456
    decaps cycles = 97752

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  65832  66502  66753  66987  67183  67344  67520  67723  68043  68606  70291
    encaps percentiles:  76420  76813  77027  77193  77373  77456  77618  77781  78059  79033  80517
    decaps percentiles:  96407  97021  97267  97457  97627  97752  97878  98023  98315  99250 100999

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU AUTO=1 OPT=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 9342
    encaps cycles = 13136
    decaps cycles = 17874

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:   9257   9284   9296   9309   9325   9342   9380   9418   9456   9518  10017
    encaps percentiles:  13059  13087  13101  13111  13125  13136  13149  13165  13204  13267  13806
    decaps percentiles:  17600  17696  17775  17823  17850  17874  17893  17911  17951  18027  18456

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 16166
    encaps cycles = 18175
    decaps cycles = 24730

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  16007  16084  16108  16135  16152  16166  16182  16199  16218  16311  16877
    encaps percentiles:  18031  18109  18131  18149  18159  18175  18187  18206  18222  18351  18876
    decaps percentiles:  24466  24623  24661  24684  24707  24730  24760  24799  24861  25012  25535

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 21881
    encaps cycles = 25301
    decaps cycles = 34390

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  21710  21776  21803  21827  21853  21881  21907  21931  21967  22469  23111
    encaps percentiles:  25164  25226  25251  25270  25287  25301  25323  25343  25389  25911  26060
    decaps percentiles:  34090  34207  34277  34318  34356  34390  34420  34451  34508  34970  35397
INFO  > Benchmark          Compile     (native, no_opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=0 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native, no_opt) 
   keypair cycles = 25504
    encaps cycles = 31843
    decaps cycles = 42080

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  25008  25223  25313  25390  25452  25504  25563  25636  25717  25866  27813
    encaps percentiles:  31093  31417  31549  31624  31760  31843  31961  32067  32187  32341  34082
    decaps percentiles:  41185  41569  41717  41848  41955  42080  42192  42326  42506  42808  43976

INFO  > Benchmark          ML-KEM-768  (native, no_opt) 
   keypair cycles = 44793
    encaps cycles = 52424
    decaps cycles = 67443

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  43673  44181  44400  44544  44666  44793  44926  45081  45295  45751  47216
    encaps percentiles:  51600  51978  52146  52237  52317  52424  52515  52641  52769  53113  54465
    decaps percentiles:  66441  66892  67083  67203  67337  67443  67571  67708  67858  68302  69802

INFO  > Benchmark          ML-KEM-1024 (native, no_opt) 
   keypair cycles = 67355
    encaps cycles = 77356
    decaps cycles = 97552

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  66000  66561  66833  67014  67193  67355  67576  67812  68172  68771  70545
    encaps percentiles:  76249  76729  76921  77069  77227  77356  77493  77658  77880  78871  80308
    decaps percentiles:  96496  96877  97076  97267  97404  97552  97722  97877  98145  99153 101011

INFO  > Benchmark          Compile     (native,    opt) CFLAGS=-march=native -mtune=native -mavx2 -mbmi2 -mpopcnt -maes -flto make CROSS_PREFIX= bench CYCLES=PMU OPT=1 AUTO=1
INFO  > Benchmark          ML-KEM-512  (native,    opt) 
   keypair cycles = 9272
    encaps cycles = 13127
    decaps cycles = 17910

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:   9174   9202   9223   9238   9254   9272   9305   9335   9388   9426  10063
    encaps percentiles:  13040  13077  13091  13104  13114  13127  13141  13162  13202  13278  13781
    decaps percentiles:  17629  17723  17804  17860  17888  17910  17934  17968  18008  18103  18537

INFO  > Benchmark          ML-KEM-768  (native,    opt) 
   keypair cycles = 16161
    encaps cycles = 18105
    decaps cycles = 24648

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  16005  16075  16105  16130  16146  16161  16174  16194  16218  16304  16881
    encaps percentiles:  18011  18053  18068  18082  18093  18105  18121  18142  18170  18349  18871
    decaps percentiles:  24463  24530  24568  24594  24617  24648  24690  24723  24794  25059  25471

INFO  > Benchmark          ML-KEM-1024 (native,    opt) 
   keypair cycles = 21877
    encaps cycles = 25078
    decaps cycles = 34464

           percentile      1     10     20     30     40     50     60     70     80     90     99
   keypair percentiles:  21709  21766  21808  21829  21853  21877  21902  21923  21976  22449  23114
    encaps percentiles:  24960  25003  25026  25050  25063  25078  25096  25116  25144  25701  25840
    decaps percentiles:  34247  34324  34379  34410  34438  34464  34500  34543  34596  35049  35397