llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.6k stars 11.82k forks source link

clang -cc1 -analyze crashes with Assertion `(!BB || Parent == NodeTrait::getParent(const_cast<NodeT *>(BB))) && "cannot get DomTreeNode of block with different parent"' failed. #105512

Open mikaelholmen opened 2 months ago

mikaelholmen commented 2 months ago

llvm commit: 4f075086e7b8d Reproduce with: clang -cc1 -analyze -analyzer-checker=core bbi-98084.c Result:

clang: ../include/llvm/Support/GenericDomTree.h:401: DomTreeNodeBase<NodeT> *llvm::DominatorTreeBase<clang::CFGBlock, true>::getNode(const NodeT *) const [NodeT = clang::CFGBlock, IsPostDom = true]: Assertion `(!BB || Parent == NodeTrait::getParent(const_cast<NodeT *>(BB))) && "cannot get DomTreeNode of block with different parent"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: build-all/bin/clang -cc1 -analyze -analyzer-checker=core bbi-98084.c
1.  <eof> parser at end of file
 #0 0x00005607f89d9a77 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/clang+0x7f1ca77)
 #1 0x00005607f89d75de llvm::sys::RunSignalHandlers() (build-all/bin/clang+0x7f1a5de)
 #2 0x00005607f89da13f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f67d9f0bcf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f67d7ac4acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f67d7a97ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f67d7a97d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f67d7abd426 (/lib64/libc.so.6+0x47426)
 #8 0x00005607fa6878ff (build-all/bin/clang+0x9bca8ff)
 #9 0x00005607fa686d26 llvm::IDFCalculatorBase<clang::CFGBlock, true>::calculate(llvm::SmallVectorImpl<clang::CFGBlock*>&) (build-all/bin/clang+0x9bc9d26)
#10 0x00005607fa685416 clang::ControlDependencyCalculator::getControlDependencies(clang::CFGBlock*) DebugCheckers.cpp:0:0
#11 0x00005607fa9f61a1 (anonymous namespace)::TrackControlDependencyCondBRVisitor::VisitNode(clang::ento::ExplodedNode const*, clang::ento::BugReporterContext&, clang::ento::PathSensitiveBugReport&) BugReporterVisitors.cpp:0:0
#12 0x00005607fa9e1095 generateVisitorsDiagnostics(clang::ento::PathSensitiveBugReport*, clang::ento::ExplodedNode const*, clang::ento::BugReporterContext&) BugReporter.cpp:0:0
#13 0x00005607fa9dc984 clang::ento::PathSensitiveBugReporter::generatePathDiagnostics(llvm::ArrayRef<clang::ento::PathDiagnosticConsumer*>, llvm::ArrayRef<clang::ento::PathSensitiveBugReport*>&) (build-all/bin/clang+0x9f1f984)
#14 0x00005607fa9dffab clang::ento::PathSensitiveBugReporter::generateDiagnosticForConsumerMap(clang::ento::BugReport*, llvm::ArrayRef<clang::ento::PathDiagnosticConsumer*>, llvm::ArrayRef<clang::ento::BugReport*>) (build-all/bin/clang+0x9f22fab)
#15 0x00005607fa9da2f6 clang::ento::BugReporter::FlushReport(clang::ento::BugReportEquivClass&) (build-all/bin/clang+0x9f1d2f6)
#16 0x00005607fa9da0cb clang::ento::BugReporter::FlushReports() (build-all/bin/clang+0x9f1d0cb)
#17 0x00005607fa5bc7ed (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*, void>>*) AnalysisConsumer.cpp:0:0
#18 0x00005607fa5954bb (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) AnalysisConsumer.cpp:0:0
#19 0x00005607fab3fd37 clang::ParseAST(clang::Sema&, bool, bool) (build-all/bin/clang+0xa082d37)
#20 0x00005607f968ccf0 clang::FrontendAction::Execute() (build-all/bin/clang+0x8bcfcf0)
#21 0x00005607f95f7b4f clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (build-all/bin/clang+0x8b3ab4f)
#22 0x00005607f977a8de clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (build-all/bin/clang+0x8cbd8de)
#23 0x00005607f6249576 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (build-all/bin/clang+0x578c576)
#24 0x00005607f6245d1d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#25 0x00005607f6244a64 clang_main(int, char**, llvm::ToolContext const&) (build-all/bin/clang+0x5787a64)
#26 0x00005607f6256247 main (build-all/bin/clang+0x5799247)
#27 0x00007f67d7ab0d85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#28 0x00005607f624362e _start (build-all/bin/clang+0x578662e)
Abort (core dumped)

This starts happening with 5ce47a581350

Reland "[Support] Assert that DomTree nodes share parent" (#102782)

and I originally reported it here https://github.com/llvm/llvm-project/pull/102782#issuecomment-2291055153 but that didn't really lead anywhere so now also writing a proper issue to not only have the problem hidden in that merged PR.

bbi-98084.c.gz

llvmbot commented 2 months ago

@llvm/issue-subscribers-clang-static-analyzer

Author: Mikael Holmén (mikaelholmen)

llvm commit: 4f075086e7b8d Reproduce with: ```clang -cc1 -analyze -analyzer-checker=core bbi-98084.c``` Result: ``` clang: ../include/llvm/Support/GenericDomTree.h:401: DomTreeNodeBase<NodeT> *llvm::DominatorTreeBase<clang::CFGBlock, true>::getNode(const NodeT *) const [NodeT = clang::CFGBlock, IsPostDom = true]: Assertion `(!BB || Parent == NodeTrait::getParent(const_cast<NodeT *>(BB))) && "cannot get DomTreeNode of block with different parent"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: build-all/bin/clang -cc1 -analyze -analyzer-checker=core bbi-98084.c 1. <eof> parser at end of file #0 0x00005607f89d9a77 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/clang+0x7f1ca77) #1 0x00005607f89d75de llvm::sys::RunSignalHandlers() (build-all/bin/clang+0x7f1a5de) #2 0x00005607f89da13f SignalHandler(int) Signals.cpp:0:0 #3 0x00007f67d9f0bcf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0) #4 0x00007f67d7ac4acf raise (/lib64/libc.so.6+0x4eacf) #5 0x00007f67d7a97ea5 abort (/lib64/libc.so.6+0x21ea5) #6 0x00007f67d7a97d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79) #7 0x00007f67d7abd426 (/lib64/libc.so.6+0x47426) #8 0x00005607fa6878ff (build-all/bin/clang+0x9bca8ff) #9 0x00005607fa686d26 llvm::IDFCalculatorBase<clang::CFGBlock, true>::calculate(llvm::SmallVectorImpl<clang::CFGBlock*>&) (build-all/bin/clang+0x9bc9d26) #10 0x00005607fa685416 clang::ControlDependencyCalculator::getControlDependencies(clang::CFGBlock*) DebugCheckers.cpp:0:0 #11 0x00005607fa9f61a1 (anonymous namespace)::TrackControlDependencyCondBRVisitor::VisitNode(clang::ento::ExplodedNode const*, clang::ento::BugReporterContext&, clang::ento::PathSensitiveBugReport&) BugReporterVisitors.cpp:0:0 #12 0x00005607fa9e1095 generateVisitorsDiagnostics(clang::ento::PathSensitiveBugReport*, clang::ento::ExplodedNode const*, clang::ento::BugReporterContext&) BugReporter.cpp:0:0 #13 0x00005607fa9dc984 clang::ento::PathSensitiveBugReporter::generatePathDiagnostics(llvm::ArrayRef<clang::ento::PathDiagnosticConsumer*>, llvm::ArrayRef<clang::ento::PathSensitiveBugReport*>&) (build-all/bin/clang+0x9f1f984) #14 0x00005607fa9dffab clang::ento::PathSensitiveBugReporter::generateDiagnosticForConsumerMap(clang::ento::BugReport*, llvm::ArrayRef<clang::ento::PathDiagnosticConsumer*>, llvm::ArrayRef<clang::ento::BugReport*>) (build-all/bin/clang+0x9f22fab) #15 0x00005607fa9da2f6 clang::ento::BugReporter::FlushReport(clang::ento::BugReportEquivClass&) (build-all/bin/clang+0x9f1d2f6) #16 0x00005607fa9da0cb clang::ento::BugReporter::FlushReports() (build-all/bin/clang+0x9f1d0cb) #17 0x00005607fa5bc7ed (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*, void>>*) AnalysisConsumer.cpp:0:0 #18 0x00005607fa5954bb (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) AnalysisConsumer.cpp:0:0 #19 0x00005607fab3fd37 clang::ParseAST(clang::Sema&, bool, bool) (build-all/bin/clang+0xa082d37) #20 0x00005607f968ccf0 clang::FrontendAction::Execute() (build-all/bin/clang+0x8bcfcf0) #21 0x00005607f95f7b4f clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (build-all/bin/clang+0x8b3ab4f) #22 0x00005607f977a8de clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (build-all/bin/clang+0x8cbd8de) #23 0x00005607f6249576 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (build-all/bin/clang+0x578c576) #24 0x00005607f6245d1d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 #25 0x00005607f6244a64 clang_main(int, char**, llvm::ToolContext const&) (build-all/bin/clang+0x5787a64) #26 0x00005607f6256247 main (build-all/bin/clang+0x5799247) #27 0x00007f67d7ab0d85 __libc_start_main (/lib64/libc.so.6+0x3ad85) #28 0x00005607f624362e _start (build-all/bin/clang+0x578662e) Abort (core dumped) ``` This starts happening with 5ce47a581350 ``` Reland "[Support] Assert that DomTree nodes share parent" (#102782) ``` and I originally reported it here https://github.com/llvm/llvm-project/pull/102782#issuecomment-2291055153 but that didn't really lead anywhere so now also writing a proper issue to not only have the problem hidden in that merged PR. [bbi-98084.c.gz](https://github.com/user-attachments/files/16691278/bbi-98084.c.gz)
steakhal commented 2 months ago

I'll have a look next week. Thanks.

steakhal commented 1 month ago

Node reclamation bites us (again). Here is a smaller repro: https://compiler-explorer.com/z/KTjxP11vd (Note that to make the repro reproduce, you need to use the graph-trim-interval=19. Without overriding that interval, one can only came up with really ugly reproducers otherwise to trigger the reclamation.)

struct wrapper { int member; };
void create_vla(const struct wrapper *w) {
  int vla[w->member * 1];
}
void top(struct wrapper wrp) {
  if (wrp.member != 0)
    return;
  create_vla(&wrp);
}

Basically, the VLA size checker detects the bug and creates a bug reports - and also calls bugreporter::trackExpressionValue on the report to track how we ended up with a zero size for the VLA. That will call Tracker::create(Report)->track(E, InputNode, Opts); then that inside will decompose the expression - in our case the multiplication binop, and track the left operand of it. It is done by:

const Expr *Inner = peelOffOuterExpr(E, N);
const ExplodedNode *LVNode = findNodeForExpression(N, Inner);
Handler->handle(Inner, N, LVNode, Opts);

Remember that the ExplodedNode that contains the report is protected from node reclamation (ExplodedGraph::shouldCollect), but the LVNode returned by findNodeForExpression is not guaranteed to live long. And in this case, node relamation will indeed reuse that ExplodedNode for a different node, turning this LVNode dangling eventually.

Now, the problem is that we will construct a BugReportVisitor saving this pointer, which will be dangling by the time we get to the report construction - when the bug report visitors kick in. It will refer to a completely different ExplodedNode :D

I'd say we could solve this by "defering" the tracking in trackExpressionValue and exploit the fact that the report node is never reclaimed. What I'd prefer is to simply disable node reclamation and that would bring the immutability assumption back.

steakhal commented 1 month ago

That said, I think this happens so infrequently, without much actual consequences (without assertions), that I don't think this bug deserves an immediate action.

Users that trip on this, we can suggest setting the graph-trim-interval=0 as a mitigation.

NagyDonat commented 1 month ago

What I'd prefer is to simply disable node reclamation and that would bring the immutability assumption back.

I agree that node reclamation is inelegant, and I would like to see it eliminated from the sources unless that change has a prohibitive impact on performance. What prevents us from disabling it?

necto commented 1 month ago

What prevents us from disabling it?

We have tried to disable it and ran the analysis on a couple of hundreds of open-source projects of various sizes. For most of them, the effect was on par with regular CI noise, but a couple of projects experienced a noticeable slowdown. In particular, suffer translation units with large static arrays. For example, llvm/lib/Target/MSP430/MSP430ISelDAGToDAG.cpp analysis takes twice the time if node reclamation is disabled.

Some other performance losers:

| project     | translation unit                                  | w/o  | with | noise |
|-------------+---------------------------------------------------+------+------+-------|
| alibaba-mnn | express/MathOp.cpp                                | 35s  | 24s  | 24s   |
| Clang       | llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp   | 33s  | 26s  | 50s   |
| Clang       | llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp           | 30s  | 23s  | 1s    |
| Clang       | llvm/lib/Target/MSP430/MSP430ISelDAGToDAG.cpp     | 50s  | 32s  | 5s    |
| CPython     | Python/compile.c                                  | 34s  | 22s  | 10s   |
| Firefox     | js/src/vm/Interpreter.cpp                         | 148s | 145s | 4s    |
| HHVM        | hphp/runtime/ext/fileinfo/libmagic/apprentice.cpp | 94s  | 65s  | 1s    |
| Linux       | drivers/media/usb/gspca/topro.c                   | 190s | 56s  | 21s   |
| PHP         | ext/opcache/jit/dynasm/minilua.c                  | 53s  | 22s  | 21s   |
| Ruby        | enc/unicode.c                                     | 55s  | 49s  | 10s   |
| taskflow    | unittests/algorithms.cpp                          | 30s  | 19s  | 16s   |
| taskflow    | unittests/control_flow.cpp                        | 47s  | 29s  | 31s   |
| taskflow    | unittests/pipelines.cpp                           | 54s  | 43s  | 23s   |
| taskflow    | unittests/scalable_pipelines.cpp                  | 43s  | 24s  | 21s   |
bjope commented 1 month ago

IIUC graph-trim-interval is about saving memory, rather that saving CPU time (even thought those might be a bit related). Did you perhaps measure the impact on memory usage as well, or only speed?

necto commented 1 month ago

I did not measure memory impact as we are usually less concerned with memory footprint as long as it is reasonable, and I can only say the footprint remained reasonable.

bjope commented 1 month ago

I did not measure memory impact as we are usually less concerned with memory footprint as long as it is reasonable, and I can only say the footprint remained reasonable.

Ok, thanks.

NagyDonat commented 2 weeks ago

@steakhal Could you send me the example files where you observed the large slowdowns when you disabled the node reclamation? We are planning that somebody from my team will start working on analyzing them (probably not me, but I'll contribute if needed).

By the way, what was the average performance impact of this change? I see that there were some translation units with +100% or more runtime; but what level of slowdown did you observe on whole projects or if you take an average over all the projects that you analyzed?

necto commented 2 weeks ago

@steakhal Could you send me the example files where you observed the large slowdowns when you disabled the node reclamation? We are planning that somebody from my team will start working on analyzing them (probably not me, but I'll contribute if needed).

I'll prepare the files and try to attach them to this issue.

By the way, what was the average performance impact of this change? I see that there were some translation units with +100% or more runtime; but what level of slowdown did you observe on whole projects or if you take an average over all the projects that you analyzed?

Across all TU of all analyzed projects, on average there was no noticeable impact on the analysis time. The Standard deviation of the impact is ~4s.

Here are several runs of the same 12 most impacted projects (b2b is the run-to-run difference in CSA times without any changes in the analyzer):


Project                               b2b             466-642     616-583    462-634   561-496
FlatBuffers                            +65.33ms     +287.90ms    +471.71ms   -1069ms  -1858ms
bash-linaro-aarch64                   +336.80ms     +382.42ms   +1461.31ms    -107ms    -52ms
ccache                                -170.31ms     +339.15ms    +498.57ms     +20ms   +305ms
fmt                                  -1671.79ms    +1219.31ms  +18097.66ms    -820ms  +3391ms
libmicrohttpd                           -5.60ms     +133.42ms    +127.73ms    +138ms    +74ms
libuv                                  -39.41ms     +450.00ms    +112.68ms     -12ms   +100ms
nnn                                  -1268.00ms    +5910.00ms   +4063.00ms   -4867ms  +2144ms
rocksdb                               +143.71ms     +391.11ms    +919.06ms    +408ms   +850ms
taskflow                             -5027.56ms    +6565.17ms   +2718.92ms  +13398ms +12486ms
tdengine                               +59.11ms     +307.10ms    +373.49ms   +1840ms    -83ms
tesseract                             +105.49ms    +4079.23ms    +115.40ms    +118ms   -744ms
zabbix                                 +65.86ms     +499.26ms   +1426.33ms     -81ms   +279ms

As you can see, it is difficult to make reliable conclusions out of such noisy data. However, the few most impacted TUs consistently demonstrate a regression.

necto commented 2 weeks ago

@steakhal Could you send me the example files where you observed the large slowdowns when you disabled the node reclamation?

Here you go: repros.tar.gz In the tarball you find 3 TUs that consistently run slower with node reclamation disabled, as well as, "run-clang.sh" that demonstrates the regression in analysis time.

Here are the analysis times on my machine with clang-18:

steakhal commented 2 weeks ago

For me the repros.tar.gz also demonstrates the issue, on llvmorg-19.1.1:

Here is how the FlameGraph looks like without reclamation for ruby-unicode.c: image

Here how with node reclamation: image

Notice the frames containing the word EnvironmentEntry are highlighted with purple on both charts. This is a sign that the environment immutable tree grows huge without reclamation, probably wasting a lot of time doing lookups and insertions (and tree-canonicalizations).

We also spend a HUGE amount of time in frames containing BindingKey, which means we also suffer in the Store for the same reason (with and without node reclamation), so it anyways makes sense to look at these immutable trees - eventually, but that doesn't seem to be much worse by disabling node reclamation. Interestingly, clang spends its time withing frames EnvironmentEntry|BindingKey in 82-88% of its time in this example.

To get FlameGraphs, do this:

perf record -F99 -a -g --  clang -cc1 <command>...
perf script | ~/git/FlameGraph/stackcollapse-perf.pl > perf.folded
~/git/FlameGraph/flamegraph.pl perf.folded > with-reclamation.svg
firefox with-reclamation.svg
NagyDonat commented 2 weeks ago

Thanks for uploading repros.tar.gz!

I tried to start experimenting with it (on up-to date main clang), but interestingly I ran into an unrelated crash on ruby-unicode.c: https://github.com/ericsson/llvm-project/commit/b1e4ddbfad5baa1ba243a38653aa8c39715371b0 .

Edit: I created a github issue to track this crash: https://github.com/llvm/llvm-project/issues/111147 . I briefly looked at the code of BoolAssignmentChecker.cpp and the assertion itself and I didn't spot any obvious cause.

tigbr commented 1 week ago

Hi Everyone! Today I talked with @NagyDonat about this issue and I have started investigating it.

NagyDonat commented 1 week ago

@steakhal @haoNoQ @Xazax-hun What would you think about setting node reclamation to off-by-default as a temporary countermeasure to avoid this crash?

I hope that with @tigbr we can find a solution that removes the reclamation and ensures good performance on all translation units, but while we're working on that, it would be good to ensure that this crash doesn't trouble the users who use the main clang -- while temporary stomaching a measurable, but not highly visible slowdown is acceptable IMO.

steakhal commented 1 week ago

I can stand by disabling node reclamation given that our measurements were correct - so I'd like someone else to repeat the measurements at some scale to gain confidence that our decision is based of real datapoints. That measurements should also include ffmpeg (as a project showcasing the slowdown), and confirm that even in worst case translation unit analysis times behave as we described.

Assuming the observed numbers align with the expectation, we can turn it off by default.

Xazax-hun commented 1 week ago

Note that this is not just about performance, but about memory. If the change in the memory consumption is significant, that might be the difference whether some users can do -j 12 or -j 8 before running out of memory which would be a huge difference in the throughput. I don't think we can make this decision without looking at the memory consumption.

NagyDonat commented 1 week ago

I created PR #111843 for the temporary workaround that I suggested, let's continue discussion about it there. (Obviously I won't merge it until there's a consensus about it, but let's leave this ticket for discussing the permanent solution.)