jnk0le / cortexm-AES

high performance AES implementations optimized for cortex-m microcontrollers
MIT License
40 stars 4 forks source link
aes aes-algorithm aes-cipher arm asm assembly constant-time cortex-m cortex-m0 cortex-m3 cortex-m4 cortex-m7 cortex-m85 embedded fast lib library microcontrollers optimized risc-v

cortexm AES

Collection of software AES implementations optimized for real world microcontrollers.

build

Repository root directory is expected to be the only include path.

If repo is added as eclipse linked folder the root folder has to be added to ASM, C and CPP include paths (-I) (proj preporties -> C++ build -> settings)

Includes also have to start from root (e.g. #include <aes/cipher.hpp>)

No cmake yet.

notes

cryptoanalysis

some of the cryptoanalysis works/papers, that tested one or more of the provided implementations.

https://webthesis.biblio.polito.it/secure/26870/1/tesi.pdf - (CM3_1T on cortex-m4 @ 1871e94)

base implementations

modes implementations

generic

CBC_GENERIC

CTR32_GENERIC

cortex-m0/m0+

cortex-m3/m4

CTR32_CM3_1T

Implements counter mode caching. Do not use if IV/counter is secret as it will lead to a timming leak of a single byte, every 256 aligned counter steps.

CTR32_CM3_1T_unrolled

unrolled version of CTR32_CM3_1T

performance (in cycles per byte)

Mode cipher function STM32F1 (0ws/2ws) - CM3_1T STM32F4 (0ws/5ws) - CM3_1T
CBC_GENERIC<>
CTR32_GENERIC<>
CTR32<128> 32.09/43.79 32.09
CTR32<256> 46.59/63.79 46.59
CTR32_unrolled<128> 30.59/41.60 30.59/38.48
CTR32_unrolled<256> 44.34/59.98 44.34/55.73

results assume that input, expanded round key and stack lie in the same memory block (e.g. SRAM1 vs SRAM2 and CCM on f407)

specific function sizes

Function code size in bytes stack usage in bytes notes
CM3_1T_AES_CTR32_enc 862 68(72) (+1 arg passed on stack) uses Te2 table
CM3_1T_AES128_CTR32_enc_unrolled 1996 64 uses Te2 table
CM3_1T_AES192_CTR32_enc_unrolled 2366 64 uses Te2 table
CM3_1T_AES256_CTR32_enc_unrolled 2734 64 uses Te2 table

extra 4 bytes on stack comes from aligning stack to 8 bytes on ISR entry.

cortex-m7

CTR32_CM7_1T

Implements counter mode caching. Do not use if IV/counter is secret as it will lead to a timming leak of a single byte, every 256 aligned counter steps.

Preloads input data in case it's in SDRAM or QSPI memory.

CTR32_CM7_1T_unrolled

unrolled version of CTR32_CM7_1T, doesn't preload input data except first cacheline.

performance (in cycles per byte)

Mode cipher function STM32H7 - CM7_1T
CBC_GENERIC<>
CTR32_GENERIC<>
CTR32<128> 15.21
CTR32<256> 21.96
CTR32_unrolled<128> 14.46
CTR32_unrolled<256> 20.95

specific function sizes

Function code size in bytes stack usage in bytes notes
CM7_1T_AES_CTR32_enc 860 72 (+1 arg passed on stack) uses Te2 table
CM7_1T_AES128_CTR32_enc_unrolled uses Te2 table
CM7_1T_AES192_CTR32_enc_unrolled uses Te2 table
CM7_1T_AES256_CTR32_enc_unrolled uses Te2 table