llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28k stars 11.56k forks source link

Stack object turned into invalid state during exception-caused stack unwinding #44366

Open boris-kolpackov opened 4 years ago

boris-kolpackov commented 4 years ago
Bugzilla Link 45021
Version 13.0
OS Windows NT
Attachments Stack trace
CC @boris-kolpackov,@DougGregor,@karen-arutyunov,@zygoloid,@rnk

Extended Description

We have a program that crashes in a destructor called for a stack object that appears to have never been constructed. The destructor is called during stack unwinding caused by throwing (and catching) an exception.

This only happens with Clang targeting MSVC and only when the program is compiled with optimization (-O3) and linked statically (but still using the DLL msvcrt). This affects Clang 9.0.1 and 10.0.0-rc2.

So far we were unable to create a small reproducer and I am not very hopeful that we will since the issue appears to be specific to the code setup/path that we have. However, I would be happy to provide instructions on how to reproduce the issue under a debugger in the program directly (it is open source). Below I am also including extra details in case they may help.

You can see the two complete build logs that only differ in the static linking bit here (this is using Clang 9.0.1):

https://stage.build2.org/?builds=build2&cf=windows_10-clang_9.0_llvm_msvc_16.4-*O3

In particular, the C++ compiler invocations are the same sans a few macros that setup DLL importing in the shared case.

Below is a stack trace. The debugger attaches the the following message to the b.exe!butl::fdclose(int fd) Line 1482 C++ line:

Unhandled exception at 0x00007FFCCC24D5B8 (ucrtbase.dll) in b.exe: An invalid parameter was passed to a function that considers invalid parameters fatal.

The crash happens when an exception is thrown at line 1250[1] trying to destroy the fdpipe object. The only fdpipe object that could possibly be being destroyed here is constructed below at line 1520[2]. So it feels like something is wrong with the exception handing tables.

[1] https://github.com/build2/build2/blob/master/libbuild2/test/script/runner.cxx#L1250 [2] https://github.com/build2/build2/blob/master/libbuild2/test/script/runner.cxx#L1520

  ucrtbase.dll!00007ffccc24d5b8()   Unknown
  ucrtbase.dll!00007ffccc24d481()   Unknown
  ucrtbase.dll!00007ffccc24d559()   Unknown
  ucrtbase.dll!00007ffccc226d33()   Unknown
  b.exe!butl::fdclose(int fd) Line 1482 C++
  [Inline Frame] b.exe!butl::auto_fd::reset(int fd) Line 12 C++
  [Inline Frame] b.exe!butl::auto_fd::~auto_fd() Line 27    C++
  [Inline Frame] b.exe!butl::fdpipe::~fdpipe() Line 821 C++
  b.exe!build2::test::script::run_pipe(build2::test::script::scope & sp, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::command>>> bc, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::command>>> ec, butl::auto_fd ifd, unsigned __int64 ci, unsigned __int64 li, const build2::location & ll, bool diag) Line 1994   C++
  [External Code]
  [Inline Frame] b.exe!build2::fail_mark_base::fail_mark_base::<unnamed-tag>::operator()(const butl::diag_record & r) Line 447  C++
  b.exe!build2::fail_mark_base::fail_mark_base::<unnamed-tag>::__invoke(const butl::diag_record & r) Line 443   C++
  [Inline Frame] b.exe!butl::diag_record::flush() Line 192  C++
  b.exe!butl::diag_record::~diag_record() Line 206  C++
  b.exe!build2::test::script::run_pipe(build2::test::script::scope & sp, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::command>>> bc, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::command>>> ec, butl::auto_fd ifd, unsigned __int64 ci, unsigned __int64 li, const build2::location & ll, bool diag) Line 1250   C++
  b.exe!build2::test::script::run_pipe(build2::test::script::scope & sp, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::command>>> bc, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::command>>> ec, butl::auto_fd ifd, unsigned __int64 ci, unsigned __int64 li, const build2::location & ll, bool diag) Line 1820   C++
  b.exe!build2::test::script::run_expr(build2::test::script::scope & sp, const std::vector<build2::test::script::expr_term,std::allocator<build2::test::script::expr_term>> & expr, unsigned __int64 li, const build2::location & ll, bool diag) Line 2049  C++
  b.exe!build2::test::script::default_runner::run(build2::test::script::scope & sp, const std::vector<build2::test::script::expr_term,std::allocator<build2::test::script::expr_term>> & expr, build2::test::script::command_type ct, unsigned __int64 li, const build2::location & ll) Line 2084   C++
  b.exe!build2::test::script::parser::exec_lines(std::_Vector_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::line>>> i, std::_Vector_iterator<std::_Vector_val<std::_Simple_types<build2::test::script::line>>> e, unsigned __int64 & li, build2::test::script::command_type ct) Line 3210  C++
  b.exe!build2::test::script::parser::exec_scope_body() Line 3115   C++
  [Inline Frame] b.exe!build2::test::script::parser::execute(build2::test::script::scope & sc, build2::test::script::script & s, build2::test::script::runner & r) Line 2910    C++
  b.exe!build2::test::script::execute_impl(build2::test::script::scope & s, build2::test::script::script & scr, build2::test::script::runner & r) Line 2920 C++
  [Inline Frame] b.exe!build2::test::script::parser::exec_scope_body::<unnamed-tag>::operator()(const build2::diag_frame * ds, build2::test::script::scope & s, build2::test::script::script & scr, build2::test::script::runner & r) Line 3075 C++
  b.exe!build2::scheduler::async<`lambda at C:\temp\cfg\build2-0.13.0-a.0.20200224125633.f749aab34924\libbuild2\test\script\parser.cxx:3069:39',const build2::diag_frame *&,std::reference_wrapper<build2::test::script::scope>,std::reference_wrapper<build2::test::script::script>,std::reference_wrapper<build2::test::script::runner>>(unsigned __int64 start_count, std::atomic<unsigned long long> & task_count, build2::test::script::parser::exec_scope_body::<unnamed-tag> && f, const build2::diag_frame * & a, std::reference_wrapper<build2::test::script::scope> && a, std::reference_wrapper<build2::test::script::script> && a, std::reference_wrapper<build2::test::script::runner> && a) Line 25   C++
  [Inline Frame] b.exe!build2::scheduler::async(std::atomic<unsigned long long> & task_count, build2::test::script::parser::exec_scope_body::<unnamed-tag> && f, const build2::diag_frame * & a, std::reference_wrapper<build2::test::script::scope> && a, std::reference_wrapper<build2::test::script::script> && a, std::reference_wrapper<build2::test::script::runner> && a) Line 85    C++
  b.exe!build2::test::script::parser::exec_scope_body() Line 3068   C++
  [Inline Frame] b.exe!build2::test::script::parser::execute(build2::test::script::scope & sc, build2::test::script::script & s, build2::test::script::runner & r) Line 2910    C++
  b.exe!build2::test::script::parser::execute(build2::test::script::script & s, build2::test::script::runner & r) Line 2890 C++
  [Inline Frame] b.exe!build2::test::perform_script_impl(const build2::target & t, const build2::test::testscript & ts, const butl::basic_path<char,butl::dir_path_kind<char>> & wd, const build2::test::common & c) Line 370   C++
  b.exe!build2::test::rule::perform_script::<unnamed-tag>::operator()(const build2::diag_frame * ds, build2::test::script::scope_state & r, const build2::target & t, const build2::test::testscript & ts, const butl::basic_path<char,butl::dir_path_kind<char>> & wd) Line 553    C++
  [Inline Frame] b.exe!build2::scheduler::async(unsigned __int64 start_count, std::atomic<unsigned long long> & task_count, build2::test::rule::perform_script::<unnamed-tag> && f, const build2::diag_frame * && a, std::reference_wrapper<build2::test::script::scope_state> && a, std::reference_wrapper<const build2::target> && a, std::reference_wrapper<const build2::test::testscript> && a, std::reference_wrapper<const butl::basic_path<char,butl::dir_path_kind<char>>> && a) Line 25   C++
  b.exe!build2::test::rule::perform_script(build2::action a, const build2::target & t, unsigned __int64 pass_n) Line 544    C++
  [External Code]
  [Inline Frame] b.exe!std::_Func_class<build2::target_state,build2::action,const build2::target &>::operator()(build2::action _Args, const build2::target & _Args) Line 969    C++
  b.exe!build2::execute_recipe(build2::action t, build2::target & r, const std::function<build2::target_state (build2::action, const build2::target &)> &) Line 1020    C++
  b.exe!build2::execute_impl(build2::action t, build2::target &) Line 1583  C++
  [Inline Frame] b.exe!build2::execute::<unnamed-tag>::operator()(const build2::diag_frame * ds, build2::target & t) Line 1696  C++
  [Inline Frame] b.exe!build2::scheduler::async(unsigned __int64 start_count, std::atomic<unsigned long long> & task_count, build2::execute::<unnamed-tag> && f, const build2::diag_frame * && a, std::reference_wrapper<build2::target> && a) Line 25  C++
  b.exe!build2::execute(build2::action a, const build2::target & ct, unsigned __int64 start_count, std::atomic<unsigned long long> * task_count) Line 1691  C++
  [Inline Frame] b.exe!build2::execute_async(build2::action a, const build2::target & t, unsigned __int64 sc, std::atomic<unsigned long long> & tc, bool fail) Line 561 C++
  b.exe!build2::execute(const butl::small_vector<build2::value,1> &, build2::action a, build2::action_targets & ts, unsigned short diag, bool prog) Line 349    C++
  b.exe!build2::main(int argc, char * * argv) Line 1595 C++
  [External Code]
rnk commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#45064

boris-kolpackov commented 2 years ago

This bug is still present in Clang 13.0.0 with the same state as for 12.0.1 (see the previous comment):

  1. The original reproducer nor longer triggers it.

  2. The reduced reproducer still triggers it with -O2 and -O3 provided -fexternc-nounwind is not passed.

We've also hit another similar issue that is likely a manifestation of the same underlying bug in Clang (i.e., it is triggered when an exception is thrown but only in -O2/-O3 optimized code). We've tried to reduce it to something that we can attach but so far unsuccessfully (things tend to change/disappear with slight code changes). One thing that is different in this case is that if running under the Visual Studio debugger, the process gets interrupted with "Unhandled exception" that points to the throw statement that is actually handled. Also the stack trace at this point looks wrong (no main(), etc).

boris-kolpackov commented 3 years ago

This still appears to be an issue in Clang 12.0.1 however, things have shifted around:

  1. The original reproducer nor longer triggers it.

  2. The reduced reproducer still triggers it with -O2 and -O3 provided -fexternc-nounwind is not passed.

I haven't tried the test mentioned in comment 9 as "program B" but could do so if you think it would be helpful.

rnk commented 3 years ago

Hans said he'd take a look at this.

boris-kolpackov commented 3 years ago

No, an exception is not thrown from an extern "C" function. My guess is -fexternc-nounwind makes a difference because it affects the shape of the generated code (because the compiler makes different assumptions/provisions when calling extern "C" functions with/without this option).

rnk commented 3 years ago

Sorry, I was on leave over the summer, so I didn't see the 2020 June/July updates. I haven't had time to read over this again, but if the problem is related to fexternc-nounwind, doesn't that mean the program is invoking UB by throwing an exception from an extern "C" function? In any case, I'll add this to our next bug triage round and do more thorough investigation later.

boris-kolpackov commented 3 years ago

Still an issue in Clang 11.0.1 and all the observations regarding -fexternc-nounwind and -O2/-O3 still hold.

boris-kolpackov commented 4 years ago

We've run into this issue in another, unrelated, program. I've done some experiments with results described below in hopes it will help to fix this.

Let's call the original code that triggers this A and this second program B. Let's also call the reduced version of A that is attached to this report (the latest version by Reid Kleckner) A'.

One thing that I was wondering is why nobody else seems to be running into this and one potential answer is that most (all) other projects used clang-cl rather than the vanilla driver like we do. So I tried clang-cl on A' and the result does not trigger the issue with /O2. In the end, I've tracked the difference down to -fexternc-nounwind that clang-cl passes. So if we pass this option to vanilla clang, then A' does not trigger the issue, neither with -O2 nor with -O3.

This, however, does not help A: for the original code the only thing that seems to help is lowering the optimization level to -O1.

B is in between: with -fexternc-nounwind the issue is gone with -O2 but not with -O3.

boris-kolpackov commented 4 years ago

Any progress on this? I would have liked to help but unfortunately this is out of my depth. I would be happy to do any legwork (testing, etc) if needed.

rnk commented 4 years ago

little more reduced Reduced the program a bit.

rnk commented 4 years ago

Thanks for the reproducer, I did some investigation. I think it's not that a destructor is running twice when it should not, it's that somehow a stale ofd field is being read in the ofd destructor.

The main thing that seems to cause this is SROA. We end up with a PHI on the ofd cleanuppad:

ehcleanup17:                                      ; preds = %ehcleanup, %"?fdopen_pipe@@YA?AUfdpipe@@XZ.exit"
  %ofd.sroa.11.0 = phi i32 [ -1, %ehcleanup ], [ %2, %"?fdopen_pipe@@YA?AUfdpipe@@XZ.exit" ], !dbg !1282
  %ofd.sroa.0.1 = phi i32 [ %ofd.sroa.0.0, %ehcleanup ], [ %1, %"?fdopen_pipe@@YA?AUfdpipe@@XZ.exit" ], !dbg !1282
  %10 = cleanuppad within none [], !dbg !1332

Anything that makes the PHI go away, such as removing potentially throwing operations from run_pipe or the extra callbacks destructor, hides the problem.

I'm not sure we should hold up 10.0 for this bug, but I will keep looking into it.

boris-kolpackov commented 4 years ago

It would be really helpful if this bug is at least looked at before the final 10.0.0 release. Because it is only triggered by throwing an exception, this could be a latent issue present in a lot more codebases than just ours. So I took the liberty to raise the importance to release blocker.

84475f08-3b62-4301-99f6-03104a4e4e71 commented 4 years ago

Reproducing program stack trace

84475f08-3b62-4301-99f6-03104a4e4e71 commented 4 years ago

Issue reproducing program

84475f08-3b62-4301-99f6-03104a4e4e71 commented 4 years ago

We have managed to reproduce the issue with a single (though not exactly trivial) source file.

Normally, the attached program creates a pipe, closes its ends and exits:

c:\tmp>clang++ -g test.cxx

c:\tmp>a.exe
pipe: 3 4
fdclose: 4
fdclose: 3
exit

However, when it is built optimized it tries to close the pipe input twice and crashes inside the _close() function:

c:\tmp>clang++ -O3 -g test.cxx

c:\tmp>a.exe
pipe: 3 4
fdclose: 4
fdclose: 3
fdclose: 3

The last fdclose() call is not expected to happen. The stack trace is attached.

boris-kolpackov commented 4 years ago

assigned to @zmodem