whisperity commented 2 years ago

Due to the interaction with dataflow, I was suggested by @steakhal to tag @Xazax-hun, @sgatev and @ymand. @ymand is also the author of the check.

When analysing "relatively complex"(?) code with clang-tidy, the bugprone-unchecked-optional-access checker either hangs, fails with seemingly infinite recursion, or crashes with a non-infinite stack trace.

Running nightly PPA on Ubuntu 20.04 LTS:

Package: clang-tidy-15
Version: 1:15~++20220513052831+6716e2055dde-1~exp1~20220513172924.250

Non-infinite stack trace

[ERROR 2022-05-17 11:45] - Analyzing SmallIndexMapTest.cpp with clang-tidy failed!

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.Program arguments: /usr/lib/llvm-15/bin/clang-tidy /mnt/test/adt/SmallIndexMapTest.cpp --export-fixes /mnt/Reports/fixit/SmallIndexMapTest.cpp_clang-tidy_edc91faa5cbe120c7656946b7203686b.yaml -- -Qunused-arguments -Wall -Wextra -x c++ --target=x86_64-linux-gnu -I/mnt/include/core -I/mnt/include/implementation -I/mnt/AnalBuild/include -I/mnt/include -I/mnt/src -isystem /mnt/AnalBuild/_deps/googletest-src/googletest/include -isystem /mnt/AnalBuild/_deps/googletest-src/googletest -std=c++17 -isystem /usr/include/c++/9 -isystem /usr/include/x86_64-linux-gnu/c++/9 -isystem /usr/include/c++/9/backward -isystem /usr/local/include -isystem /usr/include/x86_64-linux-gnu -isystem /usr/include
1.<eof> parser at end of file
2.ASTMatcher: Processing 'bugprone-unchecked-optional-access' against:
CXXMethodDecl monomux::SmallIndexMap<int, 4, true, false>::isMapped : </mnt/include/core/monomux/adt/SmallIndexMap.hpp:352:3, line:358:3>
--- Bound Nodes Begin ---
    T - { BuiltinType : int }
    fun - { CXXMethodDecl monomux::SmallIndexMap<int, 4, true, false>::isMapped : </mnt/include/core/monomux/adt/SmallIndexMap.hpp:352:3, line:358:3> }
--- Bound Nodes End ---
 #0 0x00007f55d774eff1 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1+0xe64ff1)
 #1 0x00007f55d774cd3e llvm::sys::RunSignalHandlers() (/usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1+0xe62d3e)
 #2 0x00007f55d774f51b (/usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1+0xe6551b)
 #3 0x00007f55e0d3e420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007f55df1a9c44 clang::dataflow::Environment::setValue(clang::dataflow::StorageLocation const&, clang::dataflow::Value&) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1df2c44)
 #5 0x00007f55df1bf294 (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1e08294)
 #6 0x00007f55df1c196c (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1e0a96c)
 #7 0x00007f55df1c5248 (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1e0e248)
 #8 0x00007f55df1bccf3 clang::dataflow::UncheckedOptionalAccessModel::transfer(clang::Stmt const*, clang::dataflow::SourceLocationsLattice&, clang::dataflow::Environment&) (/usr/lib/llvm-15/
bin/../lib/libclang-cpp.so.15+0x1e05cf3)
 #9 0x00007f55df1b0ee1 clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator
<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>>>&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function
<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1df9ee1)
#10 0x00007f55df1b1684 clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment c
onst&) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1dfa684)
#11 0x000055932a5d6094 llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice>>, std::allocator<llvm::Optiona
l<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice>>>>> clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang
::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&) (/usr/lib/llvm-15/bin/clang-tidy+0x398094)
#12 0x000055932a5d5b19 clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) (/usr/lib/llvm-15/bin/clang-tidy+0x397b19)
#13 0x00007f55de47892c (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c192c)
#14 0x00007f55de4ac31c clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) (/usr/lib/llvm-15/bin/../lib/libclan
g-cpp.so.15+0x10f531c)
#15 0x00007f55de47834e (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c134e)
#16 0x00007f55de47af6f (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c3f6f)
#17 0x00007f55de47f57b (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c857b)
#18 0x00007f55de47b16e (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c416e)
#19 0x00007f55de47e346 (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c7346)
#20 0x00007f55de47b4fe (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c44fe)
#21 0x00007f55de47cedb (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c5edb)
#22 0x00007f55de47b0c6 (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c40c6)
#23 0x00007f55de48388b (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10cc88b)
#24 0x00007f55de47afa7 (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x10c3fa7)
#25 0x00007f55de44af71 clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x1093f71)
#26 0x00007f55dfa1132c clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x265a32c)
#27 0x00007f55dde085a4 clang::ParseAST(clang::Sema&, bool, bool) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0xa515a4)
#28 0x00007f55df9d4297 clang::FrontendAction::Execute() (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x261d297)
#29 0x00007f55df94ac96 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x2593c96)
#30 0x00007f55dfbe484c clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, c
lang::DiagnosticConsumer*) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x282d84c)
#31 0x000055932aa46f7d (/usr/lib/llvm-15/bin/clang-tidy+0x808f7d)
#32 0x00007f55dfbe45af clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainer
Operations>) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x282d5af)
#33 0x00007f55dfbe3635 clang::tooling::ToolInvocation::run() (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x282c635)
#34 0x00007f55dfbe603e clang::tooling::ClangTool::run(clang::tooling::ToolAction*) (/usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15+0x282f03e)
#35 0x000055932aa426ed clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<
char>, std::allocator<char>>>, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) (/usr/lib/llvm-15/bin/clang-tidy+0x8046ed)
#36 0x000055932a446257 clang::tidy::clangTidyMain(int, char const**) (/usr/lib/llvm-15/bin/clang-tidy+0x208257)
#37 0x00007f55d63ce083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#38 0x000055932a4421de _start (/usr/lib/llvm-15/bin/clang-tidy+0x2041de)

The affected isMapped() function looks like this:


  /// \returns whether the stored element represents a mapped value.
  bool isMapped(const E& Elem) const noexcept
  {
    if constexpr (IntrusiveDefaultSentinel)
      return Elem != E{};
    else
      return static_cast<bool>(Elem);
  }

where E is either some type like int or T* in case IntrusiveDefaultSentinel is true, and std::unique_ptr<T> or std::optional<T> otherwise. Both are castable to bool and are false when "empty".

Infinite recursion depth

[ERROR 2022-05-17 12:22] - Analyzing Dispatch.cpp with clang-tidy failed!

I observed a crash with a seemingly infinite stack trace.

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.Program arguments: /usr/lib/llvm-15/bin/clang-tidy /mnt/src/client/Dispatch.cpp --export-fixes /mnt/Reports/fixit/Dispatch.cpp_clang-tidy_76b81b409a195383fc3c434ff5b14a4b.yaml -- -Qu
nused-arguments -Wall -Wextra -x c++ --target=x86_64-linux-gnu -I/mnt/include/core -I/mnt/include/implementation -I/mnt/AnalBuild/include -I/mnt/src -std=c++17 -isystem /usr/include/c++/9 -i
system /usr/include/x86_64-linux-gnu/c++/9 -isystem /usr/include/c++/9/backward -isystem /usr/local/include -isystem /usr/include/x86_64-linux-gnu -isystem /usr/include
1.<eof> parser at end of file
2.ASTMatcher: Processing 'bugprone-unchecked-optional-access' against:
CXXMethodDecl monomux::client::Client::responseClientID : </mnt/src/client/Dispatch.cpp:50:1 <Spelling=line:42:3>, line:59:1>
--- Bound Nodes Begin ---
    T - { RecordType : monomux::message::response::ClientID }
    fun - { CXXMethodDecl monomux::client::Client::responseClientID : </mnt/src/client/Dispatch.cpp:50:1 <Spelling=line:42:3>, line:59:1> }
--- Bound Nodes End ---
  #0 0x00007f7284b07ff1 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1+0xe64ff1)
  #1 0x00007f7284b05d3e llvm::sys::RunSignalHandlers() (/usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1+0xe62d3e)
  #2 0x00007f7284b0851b (/usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1+0xe6551b)
  #3 0x00007f728e0f7420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
  #4 0x00007f728c55fd5b clang::dataflow::DataflowAnalysisContext::addTransitiveFlowConditionConstraints(clang::dataflow::AtomicBoolValue&, llvm::DenseSet<clang::dataflow::BoolValue*, llvm::D
enseMapInfo<clang::dataflow::BoolValue*, void>>&, llvm::DenseSet<clang::dataflow::AtomicBoolValue*, llvm::DenseMapInfo<clang::dataflow::AtomicBoolValue*, void>>&) const (/usr/lib/llvm-15/bin
/../lib/libclang-cpp.so.15+0x1defd5b)

[#4 repeated until #255 where the stack printout stops]

The affected code snippet is generated partially from a macro:

#define HANDLER(NAME)                                                          \
  void Client::NAME(Client& Client, std::string_view Message)

#define MSG(TYPE)                                                              \
  std::optional<TYPE> Msg = TYPE::decode(Message);                             \
  if (!Msg)                                                                    \
    return;

HANDLER(responseClientID)
{
  MSG(response::ClientID);

  Client.ClientID = Msg->Client.ID;

  /* ... unrelated code ... */
}

Non-halting (?) execution

While the first crashes happened almost immediately and the second one happened after about 20 minutes of execution time, the last analysis job that I am running for my project, for a file that has a similar pattern to the 2nd case, has been running ~100% CPU use, currently at 496M resident memory, for about 80 minutes already.

Disabling the check or running Tidy with just other checks makes the analyses conclude successfully. With every other check used by the project (and also running Clang SA), the total analysis time is about 7 minutes.

llvmbot commented 2 years ago

@llvm/issue-subscribers-clang-tidy

ymand commented 2 years ago

Thanks for bringing this to our attention. We'll look into those bugs, but in the meantime are there any other steps we should take (e.g. disable the tidy elsewhere)?

whisperity commented 2 years ago

Thanks for bringing this to our attention. We'll look into those bugs, but in the meantime are there any other steps we should take (e.g. disable the tidy elsewhere)?

@ymand I am not sure whether it is the dataflow framework, or simply the matchers within Tidy are the culprit. In the first case, T is int... that gives me the hunch that the matchers somehow try to consider a type that is most certainly not dereferenceable as a dereferenceable entity...? I have not observed crashes or hangs related to this checker on other projects I am involved in, but I cannot safely confirm if those projects were running this checker in the first place.

(The affected project is my personal one, and I have bugprone-* enabled unconditionally, and my CI runs using the nightly LLVM PPA and that is why I observed this bug. Here is a job that hung and GitHub auto-killed it at a timeout.)

Note that the affected structures in my code, especially the first crash, involve heavy "magic" with if constexpr and such!

I have one more C++ project that I am involved in and is using C++17 and onwards features, I will try to run the analysis on that with this checker and report back with the findings.

The act of disabling the check on the user's side alleviates the problem. I would not say a revert is needed here...

ymand commented 2 years ago

Sounds good. I wasn't sure if there was some global clang-tidy disabling (short of a revert), so thanks for confirming that.

Agreed on the nature of the examples involving unusual constructs. We've also just encountered this ourselves with templates. I think that the check needs to do a better job filtering the functions it sends to the analysis engine, but we also have issues in the core that we're still ironing out.

whisperity commented 2 years ago

If there are too many problems I think a partial revert could be done by removing the registerChecker line from the BugproneTidyModule.cpp. That way, Tidy will think the checker is not existing, but the entire checker's code need not be pulled out from the repository (which would cause a lot of unnecessary noise and pollution in the Git history).

However,

    T - { RecordType : monomux::message::response::ClientID }

This is not a template. It is not even dereference-able:

#define MONOMUX_MESSAGE(KIND, NAME)                                            \
  static constexpr MessageKind Kind = MessageKind::KIND;                       \
  static std::optional<NAME> decode(std::string_view Buffer);                  \
  static std::string encode(const NAME& Object);

struct ClientID
{
  MONOMUX_MESSAGE(ClientIDResponse, ClientID);
  monomux::message::ClientID Client;
};

The second example (the infinite recursion crash) feels like the most trivial use case. I have a type, which I put inside an optional... I receive an optional from a function returning an optional. If the optional is false, I return, if it is not, I dereference it. This is analogous to the pointer case through and through... So an infinite recursion in there is definitely something wrong.

For the first crash case, I am not sure if T is supposedly the optional<int> or the int inside the optional...

@AaronBallman What do you think? Should they do a partial revert for the availability of the checker?

ymand commented 2 years ago

T is always bound to the type argument of the std::optional. So, the binding to ClientID makes sense because of this line:

static std::optional<NAME> decode(std::string_view Buffer);

The issue of templates relates to the other binding: fun. Fwiw, I'm not suggesting that's the issue here -- only that its another example of "unusual" code that we've seen issues with.

whisperity commented 2 years ago

I have one more C++ project that I am involved in and is using C++17 and onwards features, I will try to run the analysis on that with this checker and report back with the findings.

I have started a run on that other project in GitHub Actions and the only change against a working master branch was enabling this check. (Sadly, due to GHA hanging in a way that the output is not recoverable, I can't give more details...) While the analysis of that project usually concludes in about 1 hr 30 minutes, the job has been running for more than 5 hours at this point (and GitHub terminates jobs at the 6 hour mark), indicating that many of the TUs in that project are also crashing, infinite hanging, etc.

(I will try to compile a local LLVM latest and run the analysis of the project with that, not the PPA version, but running the build of this "other project" is nontrivial due to all the dependencies it has. The stack traces of my own project come from a local execution with the PPA. Unfortunately I can't seem to get the traces out of GitHub if the job hangs.)

ymand commented 2 years ago

Regarding the apparent infinite loop. It's not actually an infinite loop and it's not even tied to this checker - I can repro in the core tests with this code:

   struct Lookup {
      int x;
    };

    void target(Lookup val, bool b) {
      const Lookup* l = nullptr;
      while (b) {
        l = &val;
      }
    }

The immediate cause of this issue is that Environment::MemberLocToStruct differs each time we compare the environments at the conclusion of the loop body. That difference is caused by a chain of events triggered by the join at the back edge of the loop body. First, the incoming edge maps l to nullptr, while the backedge maps l to &val. Then, when we try to merge those two, we reach mergeDistinctValues, which results in a call to createValue for the type const Lookup *. That allocates a fresh StructValue that triggers the update to MemberLocToStruct.

Now, we have a (merged) environment with a fresh entry in MemberLocToStruct. Note that it's isomorphic to the previous entry, but since its fresh it looks different and invalidates the equality comparison.

A solution is to remove this field from the comparison. It's a supporting data structure and I think it can safely be ignored. That will solve the problem in this case, because it is assigned to the same location on each iteration. Moreover, the comparison for indirection nodes is rather permissive (for the time being), so even different addresses would not necessarilly trigger a problem, depending on the permissiveness of the model's comparison function: https://github.com/llvm/llvm-project/blob/86617256864ebcbda03b6ce843deeb6a41a85800/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L61-L69.

tsteven4 commented 1 year ago

I described a test to reproduce the hang in https://github.com/llvm/llvm-project/issues/59492

Skylion007 commented 11 months ago

Is this related to https://github.com/llvm/llvm-project/issues/59492, one of the recent comments on that issues suggests it may be fixed on main for LLVM18?

whisperity commented 11 months ago

Is this related to #59492, one of the recent comments on that issues suggests it may be fixed on main for LLVM18?

@Skylion007 Could very well be. There are a lot of issues related to this checker, so could very well be. The data-flow library/framework is an evolving part of the suite. I will try to circle around and re-run the analysis on my originally reported project.

whisperity commented 1 month ago

clang-tidy-18 at Ubuntu LLVM version 18.1.8 no longer reproduces these issues, so it looks like this is fixed.

llvm / llvm-project

[clang-tidy] `bugprone-unchecked-optional-access` crashes, hangs #55530

Non-infinite stack trace

Infinite recursion depth

Non-halting (?) execution