Closed pinkforest closed 1 week ago
Hm, the table-based implementation is not constant-time only when the whole 5kb (or 10kb) tables can't fit in the L1 cache - then an access might make the cache invalidated.
The difference between this crate and aes
is that this crate is meant for implementing higher-level cryptographic primitives (e.g AEGIS, AEZ, OCB etc), like the hazmat
api of the aes
crate. However, it uses run-time detection, that is at-least an atomic load and compare per-aesenc
, and I don't think we can afford that. So, our only option is to do a bitsliced implementation, but they are awfully slow
So is it worth it to prevent a cache-timing attack (only for small microprocessor cpus, x86 and arm both have more than enough L1 cache I think) by slowing down the code drastically?
Can't guarantee that in portable implementation and the problems doesn't stop there -
DjB has a paper that goes deep into MARS beyond the obvious "if S-boxes fits"
Also there are problems with LLVM inserting unwanted optimizations e.g. with masking without protection and we've been bitten by it having to put best effort guard over subtle. AES is not easy to do in software and make guarantees especially in Rust lacking effective optimisation barriers more so than plain C.
Also wortwhile to mention there is already ocb3 crate (which composes with aes) and AEGIS by Frank (which does rely on C libaegis but it has pure-rust mode as well that relies on softaes instead of aes).
There is this issue open incl re: aegis - please feel free to contribute :)
I guess then I can introduce a constant-time
feature, which will massively slow down Software AES, but make it constant-time (TBH, a machine without any kind of aes acceleration is rare these days, excluding microcontrollers of course)
Should use bitslicing etc. maybe fallback to aes crate instead for portability?