Open hanno-becker opened 2 weeks ago
Building and running in nix
environment. Using perf
for cycle counting.
keypair cycles=104327
encaps cycles=132266
decaps cycles=172621
CRYPTO_SECRETKEYBYTES: 1632
CRYPTO_PUBLICKEYBYTES: 800
CRYPTO_CIPHERTEXTBYTES: 768
keypair cycles=179978
encaps cycles=215560
decaps cycles=269163
CRYPTO_SECRETKEYBYTES: 2400
CRYPTO_PUBLICKEYBYTES: 1184
CRYPTO_CIPHERTEXTBYTES: 1088
keypair cycles=276589
encaps cycles=317553
decaps cycles=382802
CRYPTO_SECRETKEYBYTES: 3168
CRYPTO_PUBLICKEYBYTES: 1568
CRYPTO_CIPHERTEXTBYTES: 1568
Building and running using nix
environment from MLKEM-C-AArch64 (https://github.com/pq-code-package/mlkem-c-aarch64/commit/b84f0a307f869de21a3e5299653e8f5289579bea) as above, using perf
for cycle counting.
bench ntt_kyber ntt_kyber_123_4567 1104 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load 1268 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store 1294 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store 1119 cycles 100 repeats
bench ntt_kyber ntt_kyber_1234_567 960 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567 1058 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4 1068 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_a55 922 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_a55 906 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_a55 959 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_a55 970 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_a55 928 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_a55 1076 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_a55 989 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_a72 860 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_a72 854 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_a72 974 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_a72 919 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_a72 837 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_a72 956 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_a72 936 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_m1_firestorm 892 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_m1_firestorm 970 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_m1_firestorm 969 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_m1_firestorm 887 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_m1_firestorm 893 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_m1_firestorm 977 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_m1_firestorm 1012 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_m1_icestorm 887 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_m1_icestorm 896 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_m1_icestorm 1062 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_m1_icestorm1122 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_m1_icestorm 927 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_m1_icestorm 964 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_m1_icestorm 967 cycles 100 repeats
bench ntt_kyber ntt 968 cycles 100 repeats
bench ntt_kyber pqclean_ntt 872 cycles 100 repeats
bench ntt_kyber invntt 1062 cycles 100 repeats
bench ntt_kyber pqclean_invntt 983 cycles 100 repeats
Building and running in nix
environment. Using perf
for cycle counting.
keypair cycles=104370
encaps cycles=132276
decaps cycles=172496
CRYPTO_SECRETKEYBYTES: 1632
CRYPTO_PUBLICKEYBYTES: 800
CRYPTO_CIPHERTEXTBYTES: 768
keypair cycles=179728
encaps cycles=215611
decaps cycles=269038
CRYPTO_SECRETKEYBYTES: 2400
CRYPTO_PUBLICKEYBYTES: 1184
CRYPTO_CIPHERTEXTBYTES: 1088
keypair cycles=276640
encaps cycles=317802
decaps cycles=383131
CRYPTO_SECRETKEYBYTES: 3168
CRYPTO_PUBLICKEYBYTES: 1568
CRYPTO_CIPHERTEXTBYTES: 1568
Building and running using nix
environment from MLKEM-C-AArch64 (https://github.com/pq-code-package/mlkem-c-aarch64/commit/b84f0a307f869de21a3e5299653e8f5289579bea) as above, using perf
for cycle counting.
bench ntt_kyber ntt_kyber_123_4567 1091 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load 1266 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store 1292 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store 1118 cycles 100 repeats
bench ntt_kyber ntt_kyber_1234_567 960 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567 1058 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4 1068 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_a55 921 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_a55 906 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_a55 959 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_a55 969 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_a55 927 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_a55 1076 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_a55 988 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_a72 859 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_a72 853 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_a72 972 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_a72 917 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_a72 836 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_a72 956 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_a72 935 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_m1_firestorm 891 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_m1_firestorm 969 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_m1_firestorm 968 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_m1_firestorm 887 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_m1_firestorm 891 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_m1_firestorm 977 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_m1_firestorm 1011 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_m1_icestorm 886 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_opt_m1_icestorm 896 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_m1_icestorm 1060 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_m1_icestorm1123 cycles 100 repeats
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_m1_icestorm 925 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_opt_m1_icestorm 964 cycles 100 repeats
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_m1_icestorm 967 cycles 100 repeats
bench ntt_kyber ntt 967 cycles 100 repeats
bench ntt_kyber pqclean_ntt 872 cycles 100 repeats
bench ntt_kyber invntt 1062 cycles 100 repeats
bench ntt_kyber pqclean_invntt 982 cycles 100 repeats
Comparison of t4g.small (left) and c6g.4xlarge (right). As expected (since everything is single-threaded) there is no meaningful performance difference.
bench ntt_kyber ntt_kyber_123_4567 1104 1091
bench ntt_kyber ntt_kyber_123_4567_scalar_load 1268 1266
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store 1294 1292
bench ntt_kyber ntt_kyber_123_4567_scalar_store 1119 1118
bench ntt_kyber ntt_kyber_1234_567 960 960
bench ntt_kyber intt_kyber_123_4567 1058 1058
bench ntt_kyber intt_kyber_123_4567_manual_ld4 1068 1068
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_a55 922 921
bench ntt_kyber ntt_kyber_123_4567_opt_a55 906 906
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_a55 959 959
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_a55 970 969
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_a55 928 927
bench ntt_kyber intt_kyber_123_4567_opt_a55 1076 1076
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_a55 989 988
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_a72 860 859
bench ntt_kyber ntt_kyber_123_4567_opt_a72 854 853
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_a72 974 972
bench ntt_kyber ntt_kyber_123_4567_scalar_load_store_opt_a72 919 917
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_a72 837 836
bench ntt_kyber intt_kyber_123_4567_opt_a72 956 956
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_a72 936 935
bench ntt_kyber ntt_kyber_123_4567_opt_m1_firestorm 892 891
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_m1_firestorm 970 969
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_m1_firestorm 887 887
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_m1_firestorm 893 891
bench ntt_kyber intt_kyber_123_4567_opt_m1_firestorm 977 977
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_m1_firestorm 1012 1011
bench ntt_kyber ntt_kyber_123_4567_manual_st4_opt_m1_icestorm 887 886
bench ntt_kyber ntt_kyber_123_4567_opt_m1_icestorm 896 896
bench ntt_kyber ntt_kyber_123_4567_scalar_load_opt_m1_icestorm 1062 1060
bench ntt_kyber ntt_kyber_123_4567_scalar_store_opt_m1_icestorm 927 925
bench ntt_kyber intt_kyber_123_4567_opt_m1_icestorm 964 964
bench ntt_kyber intt_kyber_123_4567_manual_ld4_opt_m1_icestorm 967 967
bench ntt_kyber ntt 968 967
bench ntt_kyber pqclean_ntt 872 872
bench ntt_kyber invntt 1062 1062
bench ntt_kyber pqclean_invntt 983 982
keypair cycles=104327 104370
encaps cycles= 132266 132276
decaps cycles= 172621 172496
keypair cycles=179978 179728
encaps cycles= 215560 215611
decaps cycles= 269163 269038
keypair cycles=276589 276640
encaps cycles= 317553 317802
decaps cycles= 382802 383131
@hanno-becker if you could try this on a fork and get the configuration set how you like it using your credentials, when I have those credentials, I'll have an easier lift.
@ryjones I'll look into it, though unlikely before Monday or Tuesday.
Relates to: https://github.com/pq-code-package/tsc/issues/75
@ryjones asks if we could use small Graviton instances for benchmarking or whether there is the need for larger ones.
Let's collect some data and discuss here.