openfheorg / openfhe-development

This is the development repository for the OpenFHE library. The current (stable) version is v1.2.1 (released on September 10, 2024).
BSD 2-Clause "Simplified" License
724 stars 188 forks source link

tcmalloc causes crash during throwing of OpenFHE exception #761

Closed j2kun closed 4 months ago

j2kun commented 5 months ago

I have a bit of a strange situation. Google internally uses clang+tcmalloc by default for all its builds, and in v1.1.4 I've encountered a few crashes that occur whenever an OpenFHE exception is thrown, with a trace like this:

2216 third_party/tcmalloc/tcmalloc.cc:909] size check failed for 0x33c1bfc3e000: claimed 8, actual 1024, class 1
2216 third_party/tcmalloc/tcmalloc.cc:854] CHECK in do_free_with_size: CorrectSize(ptr, size, align) (false)                       
*** SIGABRT received by PID 2216 (TID 2216) on cpu 11 from PID 2216; stack trace: ***                                                                                                          
PC: @     0x7f86ec862347  (unknown)  gsignal                                                   
    @     0x7f86cfed4735       2544  base/process_state.cc:1237 FailureSignalHandler()
    @     0x7f877193d1c0  1281657408  (unknown)                          
    @     0x7f86c0b8b314        912  third_party/tcmalloc/internal/logging.cc:233 tcmalloc::tcmalloc_internal::Crash()                                                                         
    @     0x7f86c0b8ae1d         48  third_party/tcmalloc/internal/logging.cc:238 tcmalloc::tcmalloc_internal::CheckFailed()
    @     0x559c9eb7f962        688  ./third_party/tcmalloc/internal/logging.h:148 tcmalloc::tcmalloc_internal::CheckFailed<>()
    @     0x559c9eac6b1c       2640  third_party/tcmalloc/tcmalloc.cc:854 TCMallocInternalDeleteArraySized
    @     0x7f877128ba93         48  third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/unique_ptr.h:73 std::__u::default_delete<>::operator()()           
    @     0x7f877128b874         64  third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/unique_ptr.h:262 std::__u::unique_ptr<>::~unique_ptr()
    @     0x7f877128b6c6        128  third_party/openfhe/src/core/lib/utils/demangle.cpp:42 demangle()
    @     0x7f877128beb6       4384  third_party/openfhe/src/core/lib/utils/get-call-stack.cpp:84 get_call_stack()
    @     0x7f87743dae0d        352  third_party/openfhe/src/core/include/utils/exception.h:179 lbcrypto::OpenFHEException::OpenFHEException()
    @     0x7f87743e6a74        592  third_party/openfhe/src/core/include/math/nbtheory-impl.h:191 lbcrypto::RootOfUnity<>()                                                                   
    @     0x7f8773092189        544  third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:493 lbcrypto::PackedEncoding::SetParams_2n()
    @     0x7f8773090b17        912  third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:241 lbcrypto::PackedEncoding::SetParams()
    @     0x7f87730941a0        816  third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:329 lbcrypto::PackedEncoding::Pack<>()
    @     0x7f877308eb4f       6752  third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:117 lbcrypto::PackedEncoding::Encode()
    @     0x7f877442661d        624  ./third_party/openfhe/src/pke/include/encoding/plaintextfactory.h:100 lbcrypto::PlaintextFactory::MakePlaintext<>()
    @     0x7f8774425891        992  ./third_party/openfhe/src/pke/include/cryptocontext.h:246 lbcrypto::CryptoContextImpl<>::MakePlaintext()
    @     0x7f87743cb516        176  ./third_party/openfhe/src/pke/include/cryptocontext.h:1018 lbcrypto::CryptoContextImpl<>::MakePackedPlaintext()

The error comes from here: https://github.com/google/tcmalloc/blob/7d59e25cd84cdce95f137b79466dd4c4d56e6ff2/tcmalloc/tcmalloc.cc#L765

I've found it's easy to reproduce the exception being thrown by, say, using a prime plaintext modulus that does not satisfy the correct divisibility condition m divides (q-1). See the patch below for an example:

diff --git a/src/pke/examples/simple-integers-bgvrns.cpp b/src/pke/examples/simple-integers-bgvrns.cpp
index aaeed9c..d3fd960 100644
--- a/src/pke/examples/simple-integers-bgvrns.cpp
+++ b/src/pke/examples/simple-integers-bgvrns.cpp
@@ -41,7 +41,7 @@ int main() {
     // Sample Program: Step 1 - Set CryptoContext
     CCParams<CryptoContextBGVRNS> parameters;
     parameters.SetMultiplicativeDepth(2);
-    parameters.SetPlaintextModulus(65537);
+    parameters.SetPlaintextModulus(131101);  // a prime with bad divisibility

     CryptoContext<DCRTPoly> cryptoContext = GenCryptoContext(parameters);
     // Enable features that you wish to use

But I am not able to reproduce the actual trace in the CMake build. My attempt (v1.1.4 94fd76a1d965cfde13f2a540d78ce64146fc2700):

  1. Apply the patch above
  2. Configure with tcmalloc enabled
    mkdir build && cd build
    cmake .. -DWITH_TCM=ON -DBUILD_EXAMPLES=ON -DCMAKE_BUILD_TYPE=Debug
    make tcm
    make -j 25
  3. Run bin/examples/pke/simple-integers-bgvrns

As with last time, I suspect the issue is in differing compiler flags. A few stand out: -fsized-deallocation, -fno-exceptions

Here is the complete list

How would I test these compiler flags in the CMake config to see if I can reproduce this? Any idea what could be the root cause here?