souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
918 stars 208 forks source link

Compiling with _GLIBCXX_ASSERTIONS triggers tons of errors #2083

Closed cmuellner closed 2 years ago

cmuellner commented 3 years ago

libstdc++ offers the preprocessor macro _GLIBCXX_ASSERTIONS, which does the following according to the manual:

Undefined by default. When defined, enables extra error checking in the form of precondition assertions,
such as bounds checking in strings and null pointer checks when dereferencing smart pointers.

When running Souffle's self-tests (ctest --output-on-failure --progress), this changes the result from 0 errors to tons of failing test cases.

Some Linux distributions that follow best practices for system hardening (e.g. Fedora or CentOS 8) add -Wp,-D_GLIBCXX_ASSERTIONS to the CXXFLAGS by default (when packaging), so there the tests pass only when explicitly disabling these checks with -U_GLIBCXX_ASSERTIONS. However, I think we all agree, that disabling additional checks to make self-tests pass is a bad idea.

The following steps show how to reproduce this issue with current master (e4184ab103bf4 from Sep 21, 2021):

cd path/to/source
mkdir builddir
export CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS"
cmake -S . -B builddir -DSOUFFLE_DOMAIN_64BIT=ON -DSOUFFLE_GIT=OFF -DSOUFFLE_VERSION=2.1
cmake --build builddir -j$(nproc) --verbose
cd builddir
ctest --output-on-failure --force-new-ctest-process -j$(nproc) --output-on-failure --progress

Result (only first few lines):

Test project /home/cm/src/souffle/builddir                                                                             
  51/3082 Test #2890: interface/signal_error_run_souffle.................................***Failed    0.10 sec         

 761/3082 Test #2891: interface/signal_error_compile_cppFailed test dependencies: interface/signal_error_run_souffle   
 762/3082 Test #2891: interface/signal_error_compile_cpp.................................***Not Run   0.00 sec
 806/3082 Test #3060: profile/recursive_run_souffle......................................***Failed    0.15 sec         

 807/3082 Test   #56: evaluation/adt-binary-constraint_run_souffle.......................***Failed    0.15 sec         

 808/3082 Test #3036: profile/lrg_attr_id_run_souffle....................................***Failed    0.19 sec         

 811/3082 Test   #60: evaluation/adt-binary-constraint_c_run_souffle.....................***Failed    0.15 sec         

 812/3082 Test   #72: evaluation/aggregates_run_souffle..................................***Failed    0.19 sec         

 815/3082 Test   #88: evaluation/aggregates3_run_souffle.................................***Failed    0.15 sec         

 816/3082 Test   #76: evaluation/aggregates_c_run_souffle................................***Failed    0.34 sec         

 817/3082 Test   #92: evaluation/aggregates3_c_run_souffle...............................***Failed    0.19 sec         

 818/3082 Test   #96: evaluation/aggregates4_run_souffle.................................***Failed    0.17 sec         

 819/3082 Test  #104: evaluation/aggregates5_run_souffle.................................***Failed    0.15 sec         

 820/3082 Test  #108: evaluation/aggregates5_c_run_souffle...............................***Failed    0.14 sec         

 821/3082 Test  #100: evaluation/aggregates4_c_run_souffle...............................***Failed    0.19 sec         

 822/3082 Test  #112: evaluation/aggregates6_run_souffle.................................***Failed    0.13 sec

I could reproduce this on Fedora 34 (x86_64) and CentOS 8 (AArch64) with the system's default toolchain. On Fedora the toolchain has the following properties:

$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.2.1-20210728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC)

Does anyone have an idea, what causes these fails?

b-scholz commented 3 years ago

Can you pinpoint where these tests are failing, i.e. could you compile the programs in debug mode and report the C++ line numbers?

quentin commented 3 years ago

Building on Fedora, with the indicated instructions + -DCMAKE_BUILD_TYPE=Debug

I reproduce the assertions. For instance on tests/profile/recursive:

#0  0x00007ffff786e2a2 in raise () from /lib64/libc.so.6
#1  0x00007ffff78578a4 in abort () from /lib64/libc.so.6
#2  0x0000000000419c46 in std::__replacement_assert (__file=0xd04528 "/usr/include/c++/11/bits/unique_ptr.h", __line=407, 
    __function=0xd04bc0 "typename std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp, _Dp>::operator*() const [with _Tp = souffle::ast2ram::ValueIndex; _Dp = std::default_delete<souffle::ast2ram::ValueIndex>; typename s"..., 
    __condition=0xd04414 "get() != pointer()") at /usr/include/c++/11/x86_64-redhat-linux/bits/c++config.h:2660
#3  0x00000000006a07b7 in std::unique_ptr<souffle::ast2ram::ValueIndex, std::default_delete<souffle::ast2ram::ValueIndex> >::operator* (this=0x130b4c0) at /usr/include/c++/11/bits/unique_ptr.h:407
#4  0x00000000006b3467 in souffle::ast2ram::seminaive::ClauseTranslator::createInsertion (this=0x130b490, clause=...) at /home/quentin/projects/souffle/src/ast2ram/seminaive/ClauseTranslator.cpp:201
#5  0x00000000006b2b19 in souffle::ast2ram::seminaive::ClauseTranslator::createRamFactQuery (this=0x130b490, clause=...) at /home/quentin/projects/souffle/src/ast2ram/seminaive/ClauseTranslator.cpp:154
#6  0x00000000006b2871 in souffle::ast2ram::seminaive::ClauseTranslator::translateNonRecursiveClause (this=0x130b490, clause=...) at /home/quentin/projects/souffle/src/ast2ram/seminaive/ClauseTranslator.cpp:131
#7  0x00000000006e3a2a in souffle::ast2ram::TranslatorContext::translateNonRecursiveClause (this=0x1305740, clause=...) at /home/quentin/projects/souffle/src/ast2ram/utility/TranslatorContext.cpp:219
#8  0x00000000006c9d9c in souffle::ast2ram::seminaive::UnitTranslator::generateNonRecursiveRelation (this=0x130bfb0, rel=...) at /home/quentin/projects/souffle/src/ast2ram/seminaive/UnitTranslator.cpp:105
#9  0x00000000006cb4ad in souffle::ast2ram::seminaive::UnitTranslator::generateStratumPreamble (this=0x130bfb0, scc=std::set with 1 element = {...}) at /home/quentin/projects/souffle/src/ast2ram/seminaive/UnitTranslator.cpp:257
#10 0x00000000006cc91a in souffle::ast2ram::seminaive::UnitTranslator::generateRecursiveStratum (this=0x130bfb0, scc=std::set with 1 element = {...}) at /home/quentin/projects/souffle/src/ast2ram/seminaive/UnitTranslator.cpp:357
#11 0x00000000006ca65f in souffle::ast2ram::seminaive::UnitTranslator::generateStratum (this=0x130bfb0, scc=0) at /home/quentin/projects/souffle/src/ast2ram/seminaive/UnitTranslator.cpp:162
#12 0x00000000006cdf1f in souffle::ast2ram::seminaive::UnitTranslator::generateProgram (this=0x130bfb0, translationUnit=...) at /home/quentin/projects/souffle/src/ast2ram/seminaive/UnitTranslator.cpp:478
#13 0x00000000006ce549 in souffle::ast2ram::seminaive::UnitTranslator::translateUnit (this=0x130bfb0, tu=...) at /home/quentin/projects/souffle/src/ast2ram/seminaive/UnitTranslator.cpp:513
#14 0x0000000000414594 in souffle::main (argc=4, argv=0x7fffffffddd8) at /home/quentin/projects/souffle/src/main.cpp:600
#15 0x0000000000417692 in main (argc=4, argv=0x7fffffffddd8) at /home/quentin/projects/souffle/src/main.cpp:768
|   195         Own<ram::Operation> ClauseTranslator::createInsertion(const ast::Clause& clause) const {
│   196             const auto head = clause.getHead();
│   197             auto headRelationName = getClauseAtomName(clause, head);
│   198
│   199             VecOwn<ram::Expression> values;
│   200             for (const auto* arg : head->getArguments()) {
│  >201                 values.push_back(context.translateValue(*valueIndex, arg));
│   202             }
│   203 

Print of valueIndex at line 201:

(gdb) p valueIndex
$1 = std::unique_ptr<souffle::ast2ram::ValueIndex> = {
  get() = 0x0
}
cmuellner commented 3 years ago

Here is the assertion itself (from the testcase that @quentin described):

      /// Dereference the stored pointer.
      typename add_lvalue_reference<element_type>::type
      operator*() const
      {
        __glibcxx_assert(get() != pointer());
        return *get();
      }
b-scholz commented 3 years ago

We need to tidy this up. These issues don't break the code but are nasty and need to be fixed long-term.

In the example above, we dereference a null-pointer that is not accessed in the callee.

So either we need to create empty containers or switch to pointer semantics in the interface, i.e.,

 Own<ram::Expression> translateValue(const ValueIndex& index, const ast::Argument* arg) const;

needs to change to

 Own<ram::Expression> translateValue(const ValueIndex* index, const ast::Argument* arg) const;
b-scholz commented 3 years ago

I prefer empty containers because pointer semantics requires continuous checks of whether the pointer is non-null.

b-scholz commented 3 years ago

I would prefer an empty container semantics.

b-scholz commented 3 years ago

Would you like to help with this effort?

b-scholz commented 3 years ago

Most of these issues won't be hard to fix.

cmuellner commented 2 years ago

Sorry, for the late response. I was not sure if and when I find the time to work on this as I expected quite some time to understand what is going on and how a solution could look like.

It turned out that all errors were triggered by the same assertion. So the fundamental problem is that the code dereferences valueIndex although it is not allocated (i.e. holding a NULL pointer). I agree, that switching to pointers and allowing NULL-pointers is not a preferred solution.

But instead of creating empty containers, I made the allocation of valueIndex a part of the constructor of the class ClauseTranslator. This way it is guaranteed that valueIndex is valid and the assertion does not trigger.

The PR can be found in #2121.

b-scholz commented 2 years ago

Great - that is wonderful!

b-scholz commented 2 years ago

Can we close this issue or are there still issues remaining?

cmuellner commented 2 years ago

The errors were initially triggered when using Fedora's packaging service (copr). So far I was just reproducing the issue locally and fixing what I was able to reproduce. I'd like to test again with copr (will take some time to build) and will update or close this ticket then.

b-scholz commented 2 years ago

Great - no worries.

cmuellner commented 2 years ago

Bad news: packaging still fails:

 823/3061 Test   #59: evaluation/adt-binary-constraint_c_run_souffle.....................***Failed   48.50 sec

 825/3061 Test   #67: evaluation/adt-enum_c_run_souffle..................................***Failed   49.97 sec
[...]
71% tests passed, 894 tests failed out of 3061
[...]

All failing tests end with c_run_souffle, so I assume what is failing is either compiling to C++ (with souffle) or the compilation of the generated C++ code (with the host compiler).

I'm struggling a bit to isolate the issue because I can only reproduce it when compiling via rpmbuild (not when invoking cmake/ctest directly) and there is no way to turn on verbosity high enough to see the error. Once I've isolated the remaining issue, I'll provide an update here.

cmuellner commented 2 years ago

The second issue is not related to the _GLIBCXX_ASSERTIONS macro. Therefore I'm closing this ticket.

The errors mentioned in the previous commit are discussed in #2122.