rnpgp / rnp

RNP: high performance C++ OpenPGP library used by Mozilla Thunderbird
https://www.rnpgp.org
Other
194 stars 55 forks source link

RNP should gracefully handle exceptions from Botan::hex_decode #1287

Closed kaie closed 3 years ago

kaie commented 4 years ago

rnp_locate_key calls rnp_hex_decode which calls Botan::hex_decode

We have multiple Thunderbird crash reports, with an unhandled exception thrown in Botan::hex_decode

RNP should ensure to catch this potential exception to prevent crashes.

dewyatt commented 4 years ago

What revision? It shouldn't happen after https://github.com/rnpgp/rnp/pull/1212.

kaie commented 4 years ago

We are still using the revision that had been used as the base for the audit. What is a new stable revision that Thunderbird should update to? cc @ni4

kaie commented 4 years ago

650b3bce5bad75b64c7d3768d2df74a55740df15

kaie commented 4 years ago

wait a minute. It might have been a crash reported with an older binary. Let me check

kaie commented 4 years ago

This crash https://crash-stats.mozilla.org/report/index/857f563e-6fdc-4efc-b34c-a5ee10200830 was experienced with Thunderbird 78.2.1 which uses RNP revision 650b3bce5bad75b64c7d3768d2df74a55740df15

dewyatt commented 4 years ago

Weird, the signature at that rev is like:

rnp_result_t
rnp_locate_key(rnp_ffi_t         ffi,
               const char *      identifier_type,
               const char *      identifier,
               rnp_key_handle_t *handle)
try {
  //...
}
catch (std::bad_alloc &) {
    return ffi_exception((fp), __func__, "bad_alloc", RNP_ERROR_OUT_OF_MEMORY);
}
catch (std::exception & e) {
    return ffi_exception((fp), __func__, e.what());
}
catch (...) {
     return ffi_exception((fp), __func__, "unknown exception");
}

Where ffi_exception doesn't throw, so I wonder what's going on there.

ni4 commented 4 years ago

We are still using the revision that had been used as the base for the audit.

During last month we were mostly fixing issues related to data processing/possible security flaws, and ones reported during the audit. So the latest master would be good one to pick.

nwalfield commented 4 years ago

@ni4: Are the results of the audit available yet?

kaie commented 4 years ago

Not yet. But we'll make them available soon.

ni4 commented 4 years ago

@dewyatt @kaie And, additionally, Botan also guards FFI layer with ffi_guard_thunk(). If there is non-empty env variable BOTAN_FFI_PRINT_EXCEPTIONS then it will printf exception details to stderr.

Maybe @randombit has some quick thoughts on this/seen this previously?

randombit commented 4 years ago

The call stack in the crash report is interesting. Like all other functions in the Botan C layer, botan_hex_decode in all exceptions and translates them into an error return code. https://github.com/randombit/botan/blob/master/src/lib/ffi/ffi.cpp#L89 But, AIUI that crash report, it doesn't look like the problem is that an exception was thrown and the stack was unwound. Instead it looks like, an exception is being thrown, but when we called into the runtime, it crashed instead. This suggests to me either some memory corruption elsewhere which has corrupted the runtime, or (since this appears to be a Windows build which I assume means MSVC) a runtime discrepency, eg where some of the code was built with runtime X and something else with runtime Y, and so when the botan code goes to throw it calls the runtime in some incorrect way (wrt the runtime that is actually being used) and that causes a crash. [I don't know for sure that crashing when going to throw is a symptom of runtime lib mismatch but I've seen other similarly odd behavior so it seems possible].

ronaldtse commented 4 years ago

Thanks for the analysis @randombit , felt like a message coming from the depths... :wink:

ni4 commented 4 years ago

@randombit Thanks for the detailed reply! Actually, there is another report with similar circumstances, during the RSA key generation.

@kaie Do I correctly remember that rnp for TB is built using the MinGW/MSYS, but TB (most likely) via MSVC? In this case just today we merged PR #1232, which allows building with MSVC, and this could be a fix.

kaie commented 4 years ago

@kaie Do I correctly remember that rnp for TB is built using the MinGW/MSYS, but TB (most likely) via MSVC? In this case just today we merged PR #1232, which allows building with MSVC, and this could be a fix.

@Jfx2006 could you please answer this question?

jfx2006 commented 4 years ago

Windows builds use clang-cl for everything. Still links to MSVC libraries, at build time that is Visual Studio 2017 15.8.4 / SDK 10.0.17134.0. Up until a couple days ago it was Clang-9.0.1, but now it's Clang-11rc2.

It shouldn't make a difference but the production builds are cross-compiled on Linux hosts. There are builds done on Windows as well that CI runs to make sure that doesn't break for local development.

kaie commented 3 years ago

It looks like the problem was bad binaries, that somehow disabled the usual C++ exception handling. We don't understand yet why. RNP/Botan binaries produced manually (cross compiled from Linux using MinGW for Windows) don't crash.

kaie commented 3 years ago

The issue is fixed by using the /EHs clang compiler flag. (Which Botan's build system already uses, but the Thunderbird build system didn't...) Thanks everyone for your help, closing.