Open sunoru opened 1 year ago
I thought it was an issue related to reinterpret
since setindex!
took too much time when I was profiling it. I also found https://discourse.julialang.org/t/slowdown-with-reinterpret/91749
But it is actually not the problem. Maybe I need to look up how to do AES in-place?
Script:
using BenchmarkTools
using Profile
import Nettle, AESNI
key = hex2bytes("013ff43104f53f5c360a502dbff8adb7db39599be1ade3cc05a72e6e07103302")
aesni_ecb = AESNI.Aes256Ecb(key)
nettle_enc = Nettle.Encryptor("AES256", key)
nettle_dec = Nettle.Decryptor("AES256", key)
large_plain = rand(UInt8, 2^20)
large_cipher = AESNI.encrypt(aesni_ecb, large_plain)
@profile for _ in 1:10000
AESNI.encrypt(aesni_ecb, large_plain)
end
Profile.print()
@info "Benchmarking ECB with larger data (1MB)"
@info "AESNI.encrypt"
@btime AESNI.encrypt($aesni_ecb, $large_plain)
@info "AESNI.decrypt"
@btime AESNI.decrypt($aesni_ecb, $large_cipher)
@info "Nettle.encrypt"
@btime Nettle.encrypt($nettle_enc, $large_plain)
@info "Nettle.decrypt"
@btime Nettle.decrypt($nettle_dec, $large_cipher)
Output:
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎4860 @Base/client.jl:522; _start()
╎ 4860 @Base/client.jl:303; exec_options(opts::Base.JLOptions)
╎ 4860 @Base/Base.jl:419; include(mod::Module, _path::String)
╎ 4860 @Base/loading.jl:1488; _include(mapexpr::Function, mod::Module, _path::String)
╎ 4860 @Base/loading.jl:1428; include_string(mapexpr::typeof(identity), mod::Module, code::String...
╎ 4860 @Base/boot.jl:368; eval
╎ ╎ 4860 ...re/julia/stdlib/v1.8/Profile/src/Profile.jl:27; top-level scope
3╎ ╎ 4860 ...s/sunoru/AESNI.jl/benchmark/build/test-e.jl:15; macro expansion
╎ ╎ 165 @AESNI/src/modes/ecb.jl:21; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
165╎ ╎ 165 @Base/boot.jl:459; Array
╎ ╎ 4333 @AESNI/src/modes/ecb.jl:24; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
╎ ╎ 225 @Base/reinterpretarray.jl:343; getindex
╎ ╎ 225 @Base/reinterpretarray.jl:388; _getindex_ra
225╎ ╎ ╎ 225 @Base/pointer.jl:118; unsafe_store!
╎ ╎ 786 @Base/reinterpretarray.jl:483; setindex!
╎ ╎ 530 @Base/reinterpretarray.jl:523; _setindex_ra!
╎ ╎ ╎ 530 @Base/refpointer.jl:136; Ref
530╎ ╎ ╎ 530 @Base/refvalue.jl:8; RefValue
╎ ╎ 256 @Base/reinterpretarray.jl:529; _setindex_ra!
255╎ ╎ ╎ 256 @Base/array.jl:966; setindex!
136╎ ╎ 3322 @AESNI/src/core/common.jl:11; encrypt
656╎ ╎ 3186 @AESNI/src/core/aes256.jl:200; encrypt(key::AESNI.Aes256EncryptKey, input::UInt128)
╎ ╎ ╎ 257 @AESNI/src/core/aes256.jl:161; aes256_encrypt
╎ ╎ ╎ 257 @AESNI/src/Intrinsics.jl:11; _xor
135╎ ╎ ╎ 135 @Base/int.jl:366; xor
╎ ╎ ╎ 122 @Base/operators.jl:911; |>
╎ ╎ ╎ 122 @AESNI/src/Intrinsics.jl:8; to_m128i
╎ ╎ ╎ 122 @AESNI/src/utils.jl:6; unsafe_reinterpret_convert
╎ ╎ ╎ ╎ 122 @Base/pointer.jl:105; unsafe_load
122╎ ╎ ╎ ╎ 122 @Base/pointer.jl:105; unsafe_load
╎ ╎ ╎ 3 @AESNI/src/core/aes256.jl:162; aes256_encrypt
3╎ ╎ ╎ 3 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 2 @AESNI/src/core/aes256.jl:163; aes256_encrypt
2╎ ╎ ╎ 2 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 134 @AESNI/src/core/aes256.jl:164; aes256_encrypt
134╎ ╎ ╎ 134 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 6 @AESNI/src/core/aes256.jl:165; aes256_encrypt
6╎ ╎ ╎ 6 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 132 @AESNI/src/core/aes256.jl:166; aes256_encrypt
132╎ ╎ ╎ 132 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 14 @AESNI/src/core/aes256.jl:167; aes256_encrypt
14╎ ╎ ╎ 14 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 128 @AESNI/src/core/aes256.jl:168; aes256_encrypt
128╎ ╎ ╎ 128 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 49 @AESNI/src/core/aes256.jl:169; aes256_encrypt
49╎ ╎ ╎ 49 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 204 @AESNI/src/core/aes256.jl:170; aes256_encrypt
204╎ ╎ ╎ 204 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 162 @AESNI/src/core/aes256.jl:171; aes256_encrypt
162╎ ╎ ╎ 162 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 292 @AESNI/src/core/aes256.jl:172; aes256_encrypt
292╎ ╎ ╎ 292 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 317 @AESNI/src/core/aes256.jl:173; aes256_encrypt
317╎ ╎ ╎ 317 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 406 @AESNI/src/core/aes256.jl:174; aes256_encrypt
406╎ ╎ ╎ 406 @AESNI/src/Intrinsics.jl:18; aes_enc
╎ ╎ ╎ 424 @AESNI/src/core/aes256.jl:175; aes256_encrypt
424╎ ╎ ╎ 424 @AESNI/src/Intrinsics.jl:31; aes_enc_last
╎ ╎ 358 @AESNI/src/modes/ecb.jl:25; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
╎ ╎ 358 @Base/range.jl:883; iterate
358╎ ╎ 358 @Base/promotion.jl:477; ==
1╎ ╎ 1 @AESNI/src/modes/ecb.jl:26; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
Total snapshots: 4861. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task
[ Info: Benchmarking ECB with larger data (1MB)
[ Info: AESNI.encrypt
375.802 μs (2 allocations: 1.00 MiB)
[ Info: AESNI.decrypt
373.802 μs (2 allocations: 1.00 MiB)
[ Info: Nettle.encrypt
251.501 μs (2 allocations: 1.00 MiB)
[ Info: Nettle.decrypt
251.401 μs (2 allocations: 1.00 MiB)
1.5x is not too bad, actually. But hopefully it can be improved.
On my GitHub codespace, it is like this:
It took 1.5x time while block encryption/decryption is 25% faster.