martindisch / aes-ccm

Rust port of TinyCrypt's CCM mode implementation using RustCrypto's AES
Other
13 stars 4 forks source link

Use traits from the `aead` crate #3

Closed tarcieri closed 4 years ago

tarcieri commented 4 years ago

Doesn't compile yet, but curious about the general direction.

The current aead traits heavily leverage generic-array and typenum. Are you ok with that? They're a bit awful for now, but const generics seem like they're shaping up and may hopefully be available in 2020.

To fit mlen into the Aead traits, I made it a generic parameter. It still enforces the valid set of tag sizes, but using the type system via a TagSize marker trait which is only impl'd for the valid sizes.

Note that there are default impls of the rest of the methods of the Aead trait (e.g. encrypt, encrypt_in_place) which you get "for free" with any cipher with a postfix tag, which AES-CCM appears to be.

Using in-place encryption in heapless environments requires the heapless crate, but is hopefully a bit more ergonomic than the current API. See here for an example:

https://docs.rs/chacha20poly1305/0.3.0/chacha20poly1305/#in-place-usage-eliminates-alloc-requirement

Finally, it's using the upstream aead::Error type, which is fully opaque. I hope that's ok: we prefer it for preventing potential sidechannel leakage. Note it doesn't (yet) have an optional impl std::error::Error which is an oversight I'll open an upstream PR for.

martindisch commented 4 years ago

I haven't had the chance to use generic-array and typenum yet (except as a consumer of crypto libraries) and my knowledge of them and const generics in general is pretty limited, but I understand we need them to build stuff that is generic over something like a key size, or in our case the tag size. They don't look too horrible to me, and since it's the best way of doing things right now I'm completely fine with it.

I saw the default implementations when I checked out the trait, it's pretty neat! And thanks for the the link to the example, that was very instructive to get an idea of how it's going to look when it's finished.

It's too bad we're going to lose detail in the error, but I understand the reasoning behind it and if that's the price for making the crate usable generically, it's well worth it. The std::error::Error implementation is something I only thought of last-minute, because I saw some other crates doing it and it makes sense. If you can get it into the aead crate, that would be great.

I really like where this is going, great job so far!

tarcieri commented 4 years ago

Nice! I'll finish it out, then

tarcieri commented 4 years ago

Also, opened a PR to add an optional std::error::Error impl for aead::Error:

https://github.com/RustCrypto/traits/pull/63

tarcieri commented 4 years ago

Removed WIP. The tests are now passing locally for me (and the doc tests pass with --all-features)

tarcieri commented 4 years ago

Sidebar: it might be good to call the CcmMode type AesCcm since it's specialized to AES, or make it generic around the cipher type and make AesCcm a type alias.

martindisch commented 4 years ago

Excellent!

Sidebar: it might be good to call the CcmMode type AesCcm since it's specialized to AES, or make it generic around the cipher type and make AesCcm a type alias.

I renamed CcmMode to AesCcm, good idea.

Also, I publicly exported the CcmTagSize marker trait, because that makes it show up in the documentation and it's easy to see for which tag sizes it's implemented (= which tag sizes are valid). Do you think that makes sense, or is there a reason why you didn't export it?

Otherwise, the code looks (very) good to me and the tests check out. To further test it, I updated a small embedded project of mine to use this version and do some analysis. It's one where I can use allocation, so it only uses the encrypt and decrypt methods. For binary size, there's barely any difference. Interestingly, it's even slightly smaller than before, which is great. But the real surprise was stack usage. I did a static analysis of the LLVM IR with the call-stack tool, to get a rough idea of the memory requirements. Of course it can only look at the stack, but since I use the heap very sparingly, that's the vast majority of the memory.

So I looked at a function, which indirectly uses AES-CCM through a library that uses it. There's some inlining going on which I can't prevent without plastering the library with #[inline(never)], so to call-stack it looks as if the CcmMode::new function is called directly in the function, for example. And for some reason, the new version uses 1760 bytes less. Can you imagine where these savings come from? And that's not all. I don't really see where exactly it happens because of the inlining, but there are another 4872 bytes less on the stack somewhere in code that is called by this function. The only thing that is different is the AES-CCM implementation. Because I use the encrypt and decrypt methods, my ciphertext and plain text are allocated on the heap, but that can't be the reason. The buffers I used in the previous version were heap-allocated too, and the length of my plaintexts is too short to make a difference anyway. It's curious stuff, which I could probably get to the bottom of by building a separate example that only does AES-CCM and nothing else, so my results are not polluted. I might do that if I find the time. Just thought it could be worth posting, in case you have an inkling of how a small difference in the new code could exaplain that.

tarcieri commented 4 years ago

Also, I publicly exported the CcmTagSize marker trait, because that makes it show up in the documentation and it's easy to see for which tag sizes it's implemented (= which tag sizes are valid). Do you think that makes sense, or is there a reason why you didn't export it?

Nope, just an oversight.

And for some reason, the new version uses 1760 bytes less. Can you imagine where these savings come from?

If you only have one invocation and it's inlining, it could be saving you the overhead of setting up the call site and associated C stack, but that's just a guess.

martindisch commented 4 years ago

If you only have one invocation and it's inlining, it could be saving you the overhead of setting up the call site and associated C stack, but that's just a guess.

Ok, I guess I'll try and look into it some more. In any case, I've only observed positive effects, so we're all good.

From your side, do you think everything's ready to merge in the current state and for me to release it as 0.4.0?

tarcieri commented 4 years ago

Yep, looks good on my end

martindisch commented 4 years ago

I built a standalone test where I managed to inhibit most inlining. For every function, the first number is its the full stack usage (local + that of all functions it calls) and the second one is that of just its local variables. This is a function encrypting and decrypting something with the old version:

     16744   6536    aes_test::encrypt1

 0:  10208   8664    aes_ccm::ccm::CcmMode::new
 1:    512    104    aes_ccm::ccm::CcmMode::decrypt_verify
 2:    496     88    aes_ccm::ccm::CcmMode::generate_encrypt
 3:    288     48    alloc::vec::Vec<T>::with_capacity
 4:    264      8    core::result::Result<T,E>::unwrap::h3076ded6e4a3354d
 5:    256      0    core::result::Result<T,E>::unwrap::h37fafb6e29ae071b
 6:    248      8    core::ptr::real_drop_in_place::h862ef114a5dead20
 7:    200     32    core::panicking::panic_fmt
 8:     24      0    core::cmp::impls::<impl core::cmp::PartialEq<&B> for &A>::eq

And here's the same thing with the new version:

     11776   3328    aes_test::encrypt1

 0:   8448   6904    <aes_ccm::ccm::AesCcm<TagSize> as aead::NewAead>::new
 1:    544    120    aead::Aead::decrypt
 2:    544    120    aead::Aead::encrypt
 3:    272      0    core::ptr::real_drop_in_place::h2dcc130d85d46944
 4:    256      0    core::result::Result<T,E>::unwrap::h81a1198cbaa19f98
 5:    200     32    core::panicking::panic_fmt
 6:     24      8    core::cmp::impls::<impl core::cmp::PartialEq<&B> for &A>::eq

Overall it uses 4968 bytes less stack, of which the local usage makes up 3208 and the remaining difference of 1760 bytes comes from the CcmMode::new/AesCcm::new function, which is now less demanding.

martindisch commented 4 years ago

Thank you so much for your work! It's been great to have you come and implement this, making the crate that much better.

tarcieri commented 4 years ago

Awesome, thanks!