sunoru / AESNI.jl

AES through AES-NI.
MIT License
1 stars 0 forks source link

ECB encrypt/decrypt is slow #2

Open sunoru opened 1 year ago

sunoru commented 1 year ago

On my GitHub codespace, it is like this:

[ Info: Benchmarking block ciphers
[ Info: AESNI.encrypt
  36.928 ns (1 allocation: 80 bytes)
[ Info: AESNI.encrypt on UInt128
  6.800 ns (0 allocations: 0 bytes)
[ Info: AESNI.decrypt
  36.794 ns (1 allocation: 80 bytes)
[ Info: AESNI.decrypt on UInt128
  7.200 ns (0 allocations: 0 bytes)
[ Info: Nettle.encrypt
  48.531 ns (1 allocation: 80 bytes)
[ Info: Nettle.decrypt
  47.769 ns (1 allocation: 80 bytes)
[ Info: Benchmarking ECB with larger data (10MB)
[ Info: AESNI.encrypt
  3.886 ms (2 allocations: 10.00 MiB)
[ Info: AESNI.decrypt
  3.889 ms (2 allocations: 10.00 MiB)
[ Info: Nettle.encrypt
  2.621 ms (2 allocations: 10.00 MiB)
[ Info: Nettle.decrypt
  2.604 ms (2 allocations: 10.00 MiB)

It took 1.5x time while block encryption/decryption is 25% faster.

sunoru commented 1 year ago

I thought it was an issue related to reinterpret since setindex! took too much time when I was profiling it. I also found https://discourse.julialang.org/t/slowdown-with-reinterpret/91749

But it is actually not the problem. Maybe I need to look up how to do AES in-place?

sunoru commented 1 year ago

Script:

using BenchmarkTools
using Profile
import Nettle, AESNI

key = hex2bytes("013ff43104f53f5c360a502dbff8adb7db39599be1ade3cc05a72e6e07103302")

aesni_ecb = AESNI.Aes256Ecb(key)
nettle_enc = Nettle.Encryptor("AES256", key)
nettle_dec = Nettle.Decryptor("AES256", key)

large_plain = rand(UInt8, 2^20)
large_cipher = AESNI.encrypt(aesni_ecb, large_plain)

@profile for _ in 1:10000
    AESNI.encrypt(aesni_ecb, large_plain)
end
Profile.print()

@info "Benchmarking ECB with larger data (1MB)"
@info "AESNI.encrypt"
@btime AESNI.encrypt($aesni_ecb, $large_plain)
@info "AESNI.decrypt"
@btime AESNI.decrypt($aesni_ecb, $large_cipher)
@info "Nettle.encrypt"
@btime Nettle.encrypt($nettle_enc, $large_plain)
@info "Nettle.decrypt"
@btime Nettle.decrypt($nettle_dec, $large_cipher)

Output:

Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
   ╎4860 @Base/client.jl:522; _start()
   ╎ 4860 @Base/client.jl:303; exec_options(opts::Base.JLOptions)
   ╎  4860 @Base/Base.jl:419; include(mod::Module, _path::String)
   ╎   4860 @Base/loading.jl:1488; _include(mapexpr::Function, mod::Module, _path::String)
   ╎    4860 @Base/loading.jl:1428; include_string(mapexpr::typeof(identity), mod::Module, code::String...
   ╎     4860 @Base/boot.jl:368; eval
   ╎    ╎ 4860 ...re/julia/stdlib/v1.8/Profile/src/Profile.jl:27; top-level scope
  3╎    ╎  4860 ...s/sunoru/AESNI.jl/benchmark/build/test-e.jl:15; macro expansion
   ╎    ╎   165  @AESNI/src/modes/ecb.jl:21; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
165╎    ╎    165  @Base/boot.jl:459; Array
   ╎    ╎   4333 @AESNI/src/modes/ecb.jl:24; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
   ╎    ╎    225  @Base/reinterpretarray.jl:343; getindex
   ╎    ╎     225  @Base/reinterpretarray.jl:388; _getindex_ra
225╎    ╎    ╎ 225  @Base/pointer.jl:118; unsafe_store!
   ╎    ╎    786  @Base/reinterpretarray.jl:483; setindex!
   ╎    ╎     530  @Base/reinterpretarray.jl:523; _setindex_ra!
   ╎    ╎    ╎ 530  @Base/refpointer.jl:136; Ref
530╎    ╎    ╎  530  @Base/refvalue.jl:8; RefValue
   ╎    ╎     256  @Base/reinterpretarray.jl:529; _setindex_ra!
255╎    ╎    ╎ 256  @Base/array.jl:966; setindex!
136╎    ╎    3322 @AESNI/src/core/common.jl:11; encrypt
656╎    ╎     3186 @AESNI/src/core/aes256.jl:200; encrypt(key::AESNI.Aes256EncryptKey, input::UInt128)
   ╎    ╎    ╎ 257  @AESNI/src/core/aes256.jl:161; aes256_encrypt
   ╎    ╎    ╎  257  @AESNI/src/Intrinsics.jl:11; _xor
135╎    ╎    ╎   135  @Base/int.jl:366; xor
   ╎    ╎    ╎   122  @Base/operators.jl:911; |>
   ╎    ╎    ╎    122  @AESNI/src/Intrinsics.jl:8; to_m128i
   ╎    ╎    ╎     122  @AESNI/src/utils.jl:6; unsafe_reinterpret_convert
   ╎    ╎    ╎    ╎ 122  @Base/pointer.jl:105; unsafe_load
122╎    ╎    ╎    ╎  122  @Base/pointer.jl:105; unsafe_load
   ╎    ╎    ╎ 3    @AESNI/src/core/aes256.jl:162; aes256_encrypt
  3╎    ╎    ╎  3    @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 2    @AESNI/src/core/aes256.jl:163; aes256_encrypt
  2╎    ╎    ╎  2    @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 134  @AESNI/src/core/aes256.jl:164; aes256_encrypt
134╎    ╎    ╎  134  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 6    @AESNI/src/core/aes256.jl:165; aes256_encrypt
  6╎    ╎    ╎  6    @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 132  @AESNI/src/core/aes256.jl:166; aes256_encrypt
132╎    ╎    ╎  132  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 14   @AESNI/src/core/aes256.jl:167; aes256_encrypt
 14╎    ╎    ╎  14   @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 128  @AESNI/src/core/aes256.jl:168; aes256_encrypt
128╎    ╎    ╎  128  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 49   @AESNI/src/core/aes256.jl:169; aes256_encrypt
 49╎    ╎    ╎  49   @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 204  @AESNI/src/core/aes256.jl:170; aes256_encrypt
204╎    ╎    ╎  204  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 162  @AESNI/src/core/aes256.jl:171; aes256_encrypt
162╎    ╎    ╎  162  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 292  @AESNI/src/core/aes256.jl:172; aes256_encrypt
292╎    ╎    ╎  292  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 317  @AESNI/src/core/aes256.jl:173; aes256_encrypt
317╎    ╎    ╎  317  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 406  @AESNI/src/core/aes256.jl:174; aes256_encrypt
406╎    ╎    ╎  406  @AESNI/src/Intrinsics.jl:18; aes_enc
   ╎    ╎    ╎ 424  @AESNI/src/core/aes256.jl:175; aes256_encrypt
424╎    ╎    ╎  424  @AESNI/src/Intrinsics.jl:31; aes_enc_last
   ╎    ╎   358  @AESNI/src/modes/ecb.jl:25; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
   ╎    ╎    358  @Base/range.jl:883; iterate
358╎    ╎     358  @Base/promotion.jl:477; ==
  1╎    ╎   1    @AESNI/src/modes/ecb.jl:26; encrypt(ctx::AESNI.Aes256Ecb, plain::Vector{UInt8})
Total snapshots: 4861. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task
[ Info: Benchmarking ECB with larger data (1MB)
[ Info: AESNI.encrypt
  375.802 μs (2 allocations: 1.00 MiB)
[ Info: AESNI.decrypt
  373.802 μs (2 allocations: 1.00 MiB)
[ Info: Nettle.encrypt
  251.501 μs (2 allocations: 1.00 MiB)
[ Info: Nettle.decrypt
  251.401 μs (2 allocations: 1.00 MiB)

1.5x is not too bad, actually. But hopefully it can be improved.