randombit / botan

Cryptography Toolkit
https://botan.randombit.net
BSD 2-Clause "Simplified" License
2.49k stars 554 forks source link

AV on certain AEAD OCB decryption input calls under Windows. #3812

Closed ni4 closed 7 months ago

ni4 commented 8 months ago

We use Botan as cryptography backend for RNP OpenPGP library. And recently, after updating CI runners to Botan 3.1.1, started to receive failures, only for Windows platform. After isolating the issue and enabling sanitizers got the following call stack. Suspecting something in RNP which may corrupt the memory (however sanitizers doesn't repor that), I was able to isolate the issue to pure botan calls with some certain input.

Do you have any idea why this happens? Should I provide some more input, like test data file/order of decryption calls causing the failure?

2: ==6280==ERROR: AddressSanitizer: access-violation on unknown address 0x00004e000017 (pc 0x7ff9c7f0f990 bp 0x00e1edaf5fb0 sp 0x00e1edaf5eb0 T0)
2: ==6280==The signal is caused by a READ memory access.
2:     #0 0x7ff9c7f0f98f in Botan::L_computer::compute_offsets(unsigned __int64, unsigned __int64) C:\vcpkg\buildtrees\botan\src\3.1.1-d5e04c5040.clean\src\lib\modes\aead\ocb\ocb.cpp:68
2:     #1 0x7ff9c7f1024e in Botan::OCB_Decryption::decrypt(unsigned char *const, unsigned __int64) C:\vcpkg\buildtrees\botan\src\3.1.1-d5e04c5040.clean\src\lib\modes\aead\ocb\ocb.cpp:413
2:     #2 0x7ff9c7f10617 in Botan::OCB_Decryption::finish_msg(class std::vector<unsigned char, class Botan::secure_allocator<unsigned char>> &, unsigned __int64) C:\vcpkg\buildtrees\botan\src\3.1.1-d5e04c5040.clean\src\lib\modes\aead\ocb\ocb.cpp:451
2:     #3 0x7ff9c7e59139 in Botan::Cipher_Mode::finish C:\vcpkg\buildtrees\botan\x64-windows-rel\build\include\botan\cipher_mode.h:146
2:     #4 0x7ff9c7e59139 in `botan_cipher_update'::`2'::<lambda_1>::operator() C:\vcpkg\buildtrees\botan\src\3.1.1-d5e04c5040.clean\src\lib\ffi\ffi_cipher.cpp:159
2:     #5 0x7ff9c7e4ab83 in std::_Func_class<int>::operator() C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\functional:869
2:     #6 0x7ff9c7e4ab83 in Botan_FFI::ffi_guard_thunk(char const *, class std::function<(void)> const &) C:\vcpkg\buildtrees\botan\src\3.1.1-d5e04c5040.clean\src\lib\ffi\ffi.cpp:119
2:     #7 0x7ff9c7e5a9a2 in botan_cipher_update C:\vcpkg\buildtrees\botan\src\3.1.1-d5e04c5040.clean\src\lib\ffi\ffi_cipher.cpp:141
**[ botan calls start here ]**
2:     #8 0x7ff667daed44 in pgp_cipher_aead_finish(struct pgp_crypt_t *, unsigned char *, unsigned char const *, unsigned __int64) D:\a\rnp\rnp\src\lib\crypto\symmetric.cpp:589
2:     #9 0x7ff667e15435 in encrypted_src_read_aead_part D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:652
2:     #10 0x7ff667e14ed3 in encrypted_src_read_aead D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:728
2:     #11 0x7ff667de6234 in src_read(struct pgp_source_t *, void *, unsigned __int64, unsigned __int64 *) D:\a\rnp\rnp\src\librepgp\stream-common.cpp:115
2:     #12 0x7ff667e1b5ed in partial_pkt_src_read D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:273
2:     #13 0x7ff667de6234 in src_read(struct pgp_source_t *, void *, unsigned __int64, unsigned __int64 *) D:\a\rnp\rnp\src\librepgp\stream-common.cpp:115
2:     #14 0x7ff667e12e8e in compressed_src_read D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:399
2:     #15 0x7ff667de6234 in src_read(struct pgp_source_t *, void *, unsigned __int64, unsigned __int64 *) D:\a\rnp\rnp\src\librepgp\stream-common.cpp:115
2:     #16 0x7ff667e1b5ed in partial_pkt_src_read D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:273
2:     #17 0x7ff667de6234 in src_read(struct pgp_source_t *, void *, unsigned __int64, unsigned __int64 *) D:\a\rnp\rnp\src\librepgp\stream-common.cpp:115
2:     #18 0x7ff667e1b27b in literal_src_read D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:344
2:     #19 0x7ff667de6234 in src_read(struct pgp_source_t *, void *, unsigned __int64, unsigned __int64 *) D:\a\rnp\rnp\src\librepgp\stream-common.cpp:115
2:     #20 0x7ff667e1c192 in process_pgp_source(struct pgp_parse_handler_t *, struct pgp_source_t &) D:\a\rnp\rnp\src\librepgp\stream-parse.cpp:2979
2:     #21 0x7ff667da3225 in rnp_op_verify_execute D:\a\rnp\rnp\src\lib\rnp.cpp:3358
2:     #22 0x7ff667d08905 in cli_rnp_process_file(class cli_rnp_t *) D:\a\rnp\rnp\src\rnp\fficli.cpp:3193
2:     #23 0x7ff667ce3468 in rnp_cmd D:\a\rnp\rnp\src\rnp\rnp.cpp:264
2:     #24 0x7ff667ce7656 in main D:\a\rnp\rnp\src\rnp\rnp.cpp:707
2:     #25 0x7ff667eb0193 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
2:     #26 0x7ff667eb0193 in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
2:     #27 0x7ff9f9667ac3  (C:\Windows\System32\KERNEL32.DLL+0x180017ac3)
2:     #28 0x7ff9fbf4a4e0  (C:\Windows\SYSTEM32\ntdll.dll+0x18005a4e0)
reneme commented 8 months ago

Nothing that comes to mind immediately, unfortunately. Which version did you update from?

ni4 commented 8 months ago

@reneme Thanks for the reply! It was latest 2.x from vcpkg sources. Don't have Windows setup yet unfortunately to dig deeper. PR with added isolated test case (among with runner information) is here in case it would be useful: https://github.com/rnpgp/rnp/pull/2148, please see changes to src/tests/cipher.cpp

reneme commented 8 months ago

I managed to build RNP on my machine and can reproduce the crash. Looking into it.

Edit: I have a lead and am digging deeper.

reneme commented 8 months ago

The root cause is indeed a bug in Botan's OCB implementation. Perhaps already in 2.x, actually.

ocb.cpp contains a helper class called L_computer. And this has a method get(size_t i) that lazily creates new m_L values as necessary. The parameter i is the number of least-significant zeros in the binary representation of the current block index (ctz()); i.e. the bigger the input data, the larger i gets eventually.

Here's the implementation of get(size_t i):

https://github.com/randombit/botan/blob/3005ae6ef44105d6da261dbaed94abed67a3042e/src/lib/modes/aead/ocb/ocb.cpp#L45-L51

... now, if we hit an i that we haven't hit before, we calculate the new value and .push_back() it into the m_L vector. The new value for i is then returned as a reference into that vector.

Below is an excerpt of the function that uses get(). Note, in particular, how the references to L0 and L1 are retained and kept outside the loop.

https://github.com/randombit/botan/blob/3005ae6ef44105d6da261dbaed94abed67a3042e/src/lib/modes/aead/ocb/ocb.cpp#L58-L84

We're calling get() inside the loop (line 79), which is unsafe. In your reproducer, once we hit block_index = 8192 we call get(ctz(8192)), which needs to lazily add a value to m_L. This re-allocates the m_L vector, therefore invalidates the references to L0 and L1 and crashes in the next loop iteration.

I suspect, the reason that this crashes on some platforms and not on all is, that std::vector<> has some freedom to decide how much memory is pre-allocated when .push_back() is invoked. I.e. not every .push_back() will actually cause a re-allocation of the vector. Also, there are other places in the code that call get(), and that won't trigger this issue when re-allocating. Thus, there are some circumstances that have to align for this to become evident. That's probably, why it remained undetected for 6 years (https://github.com/randombit/botan/commit/444eeb5ebcb65de8f063e90a31a4709214dfe78f), when this was released in Botan 2.4.0.

Because of the above, it might be tough to build a reasonable regression test, but I'll try regardless and of course build a fix.

Thanks a lot for reporting this!

@randombit I'm assuming we want to have a backport to 2.x for this?

reneme commented 8 months ago

The patch in #3814 does fix the issue, though I'm not sure its the optimal thing to do. I propose we take further technical discussion over into the pull request.

@ni4 Thanks again for reporting and the valuable reproducer.

ni4 commented 8 months ago

@reneme Wow, thanks for fixing this and for the details! Another lurking magical bug with simple and logical explanation :) At some point started to think that we should abandon random-input tests, but this case reverts the idea.

ni4 commented 5 months ago

@reneme Was this backported to 2.19.4? Do not see it in the changelog.

reneme commented 5 months ago

I'm afraid that fell through the cracks. 😨 I'm going to open a PR for it. Unfortunately, only for the next maintenance release, then. Sorry for that and thanks for the reminder.

ni4 commented 5 months ago

Np, thanks for confirming!

reneme commented 5 months ago

As it turns out, this bug is much much easier to hit in 3.x than it is in 2.x. I had to invent a toy cipher with a different intrinsic block-parallelism to be able to produce a situation where the access violation would hit. Though, the odds are also dependent on the re-allocation behavior of std::vector<>. Hence, its still worth fixing in 2.x, I would say. Thanks again for following up.