google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.47k stars 1.03k forks source link

support swapcontext #189

Closed ramosian-glider closed 6 years ago

ramosian-glider commented 9 years ago

Originally reported on Google Code with ID 189

AddressSanitizer does not fully support swapcontext. 
Sometimes, swapcontext causes the entire shadow region (16T) 
to be written by asan-internal routines (e.g. __asan_handle_no_return)
because the location of the stack changes w/o asan noticing it.
This may cause the machine to die or hang for a long time. 

I am not at all sure if asan can fully support swapcontext, 
but we at least should collect more test cases. 

Reported by konstantin.s.serebryany on 2013-05-22 07:40:59

ramosian-glider commented 9 years ago
http://llvm.org/viewvc/llvm-project?rev=182456&view=rev adds a workaround and a better
test.

Full fix may require a significant surgery, so I'd like to see if a simple thing
is enough. 

Reported by konstantin.s.serebryany on 2013-05-22 09:04:27

ramosian-glider commented 9 years ago
I've got a test case that gives a false-positive error around swapcontext:

"ERROR: AddressSanitizer: SEGV on unknown address 0x000000000"

When I blacklist the file making that call, the code then prints a warning referring
to this bug:

"WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: . . . 
False positive error reports may follow
For details see http://code.google.com/p/address-sanitizer/issues/detail?id=189"

It's from the test suite of the Charm++ parallel runtime system (http://charmplusplus.org).
If test cases for this would be useful, I'd be happy to help in understanding that
code. If you want it reduced, I can probably do some, but it's a fairly large system
with a lot of cross-dependencies.

Reported by unmobile on 2014-01-08 20:30:01

ramosian-glider commented 9 years ago
Does asan actually report false positives after the warning about swapcontext?
A minimized test is always welcome, but we can not promise that we'll fix it -- 
swapcontext is a really tricky beast. 

Reported by konstantin.s.serebryany on 2014-01-09 04:50:37

ramosian-glider commented 9 years ago
Note that it generally makes little sense in blacklisting the code that performs a NULL
dereference.

Reported by ramosian.glider on 2014-01-10 10:39:22

ramosian-glider commented 9 years ago
Here is false positive.

When you destroy a std::exception_ptr allocated from another stack without rethrowing
it, then it crashes.

GCC 4.9.2 (on Gentoo). Boost 1.56.0 compiled with C++11 support.

{{{
==26409==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff0420b000;
bottom 0x63100000f000; size: 0x1cef041fc000 (31812891951104)
False positive error reports may follow
For details see http://code.google.com/p/address-sanitizer/issues/detail?id=189
=================================================================
==26409==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x6310000104a0
at pc 0x7fd9fccdcde3 bp 0x631000010320 sp 0x63100000fac8
WRITE of size 240 at 0x6310000104a0 thread T0
    #0 0x7fd9fccdcde2 (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x2fde2)
    #1 0x7fd9fbe8b046 in _Unwind_Resume (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libgcc_s.so.1+0x10046)
    #2 0x406dc9 in my_coroutine(boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>&)
(/tmp/a.out+0x406dc9)
    #3 0x41e7f4 in boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>,
std::__exception_ptr::exception_ptr, void (&)(boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>&),
boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>
>::run(std::__exception_ptr::exception_ptr*) (/tmp/a.out+0x41e7f4)
    #4 0x41bb88 in void boost::coroutines::detail::trampoline_push<boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>,
std::__exception_ptr::exception_ptr, void (&)(boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>&),
boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>
> >(long) (/tmp/a.out+0x41bb88)
    #5 0x7fd9fc89e710 in make_fcontext (/usr/lib64/libboost_context-cxx11-gcc4_9_2.so.1.56.0+0x710)

0x6310000104a0 is located 64672 bytes inside of 65536-byte region [0x631000000800,0x631000010800)
allocated by thread T0 here:
    #0 0x7fd9fcd04787 in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x57787)
    #1 0x414890 in boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>::allocate(boost::coroutines::stack_context&,
unsigned long) (/tmp/a.out+0x414890)
    #2 0x40d975 in boost::coroutines::push_coroutine<std::__exception_ptr::exception_ptr>::push_coroutine<void
(&)(boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>&)>(void
(&)(boost::coroutines::pull_coroutine<std::__exception_ptr::exception_ptr>&), boost::coroutines::attributes
const&) (/tmp/a.out+0x40d975)
    #3 0x406ecf in main (/tmp/a.out+0x406ecf)
    #4 0x7fd9fbaf8dc4 in __libc_start_main (/lib64/libc.so.6+0x24dc4)
}}}

Reported by vdavid@vizrt.com on 2014-12-10 18:51:48


ramosian-glider commented 9 years ago

Reported by ramosian.glider on 2015-07-30 09:05:31

ramosian-glider commented 9 years ago
Adding Project:AddressSanitizer as part of GitHub migration.

Reported by ramosian.glider on 2015-07-30 09:06:55

hbowden commented 8 years ago

I ran into this bug as well and made a test case. It's derived from the test suite in a fuzzer I'm writing. https://github.com/2trill2spill/nextgen . This was tested on Mac OSX 10.11.12, and below is the output from clang --version.

nahs-MBP:desktop nah$ clang --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.2.0
Thread model: posix

And the test case, which was compiled with clang -fsanitize=address -o test.c test.

#include <setjmp.h>
#include <stdio.h>
#include <string.h>
#include <signal.h>

static jmp_buf return_jump;

static void signal_handler(int sig)
{
    longjmp(return_jump, 1);
}

static void setup_test_sig_handler(void)
{
    struct sigaction sa;
    sigset_t ss;
    unsigned int i;

    for(i = 1; i < 512; i++)
    {
        (void)sigfillset(&ss);
        sa.sa_flags = SA_RESTART;
        sa.sa_handler = signal_handler;
        sa.sa_mask = ss;
        (void)sigaction((int)i, &sa, NULL);
    }

    return;
}

int main(void)
{
    int rtrn = setjmp(return_jump);
    if(rtrn < 0)
    {
        perror("setjmp");
        return (-1);
    }

    setup_test_sig_handler();

    /* Cause signal. */
    memmove(NULL, "123456789", 9);

    return (0);
}
kcc commented 8 years ago

2trill2spill, why is the previous comment related to this bug? The code does not even have swapcontext call. Please open a separate bug explaining what exactly went wrong.

hbowden commented 8 years ago

The error message I get from running the test case points to this page, so I assumed that It was the same issue.

ASAN:SIGSEGV
=================================================================
==8425==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000105ab39e2 bp 0x7fff5a1ab9a0 sp 0x7fff5a1ab128 T0)
    #0 0x105ab39e1 in __sanitizer::internal_memmove(void*, void const*, unsigned long) (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.0.2/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x569e1)
    #1 0x105a54a9e in main (/Users/nah/Desktop/./test+0x100000a9e)
    #2 0x7fff95a7c5ac in start (/usr/lib/system/libdyld.dylib+0x35ac)
    #3 0x0  (<unknown module>)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ??:0 __sanitizer::internal_memmove(void*, void const*, unsigned long)
==8425==ABORTING
==8425==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff5a1ac000; bottom 0x000106b5d000; size: 0x7ffe5364f000 (140730297544704)
False positive error reports may follow
For details see http://code.google.com/p/address-sanitizer/issues/detail?id=189
ASAN:SIGSEGV
==8425==AddressSanitizer: while reporting a bug found another one. Ignoring.
`
kcc commented 8 years ago

I don't see such message on Linux, so it might be a OSX-specific issue, unrelated to swapcontext. Please file a separate bug, will discuss it there.

felixguendling commented 8 years ago

Can anyone explain how https://github.com/facebook/folly/commit/2ea64dd24946cbc9f3f4ac3f6c6b98a486c56e73 works? I could not find anything like "__asan_enter_fiber"?

Is there something like VALGRIND_STACK_REGISTER / VALGRIND_STACK_DEREGISTER available for address sanitizer?

blastrock commented 8 years ago

Since there is nothing on the Internet about those functions, I guess that facebook has a fork of clang where they have implemented them.

I tried to implement the functions myself here, but I have very little knowledge of how things work, feedback is welcome. Note that the function prototypes are a little different from those used by folly. I tested this code with an implementation of coroutines on top of boost context v2, the warning about handle_no_return has indeed disappeared and it seems to work.

felixguendling commented 8 years ago

Thank you! I will try your modified version as soon as possible. You just include the "asan_interface_internal.h" header in your binary or do you use the approach from https://github.com/facebook/folly/commit/2ea64dd24946cbc9f3f4ac3f6c6b98a486c56e73 (dlsym)?

avikivity commented 8 years ago

@pdziepak

blastrock commented 8 years ago

I only did some tests for the moment where I just declared the functions in my project:

extern "C"
{
void __asan_enter_fiber(void const* stack_top, void const* stack_bottom);
void __asan_exit_fiber();
}

Of course this fails to link if I don't compile with asan. I think folly's approach (with the weak symbol attributes and the fallback on dlsym) should work too.

blastrock commented 8 years ago

FYI my patch finally passed review (a lot of things were fixed in the meantime), http://llvm.org/viewvc/llvm-project?view=revision&revision=273260 . Now let's wait for clang 3.9 :)

felixguendling commented 8 years ago

That's great news! Thank you very much for your effort! :+1:

ioquatix commented 7 years ago

I just ran into this issue. I see the solution is to notify asan if switching stacks? I'm implementing coroutines.

avikivity commented 7 years ago

@ioquatix on gcc? Whoa!

I think gcc 7 and latest clang have better support for makecontext and friends.

blastrock commented 7 years ago

Yes, the idea is to annotate your code to notify asan when you switch context.

You can find some documentation here https://github.com/llvm-mirror/compiler-rt/blob/master/include/sanitizer/common_interface_defs.h#L166 and an example in the tests https://github.com/llvm-mirror/compiler-rt/blob/master/test/asan/TestCases/Linux/swapcontext_annotation.cc .

Since the test is still there, I don't think swapcontext has got any more support for asan.

ioquatix commented 7 years ago

Cool- I'm not using makecontext/swapcontext but using ASM to switch stacks directly. I'll try out the annotations.

ioquatix commented 7 years ago

Okay, so I've tried to implement this and it appars to be compiling, but I'm having some issues.

First, the changes I made:

Fiber::resume which swaps from main stack to fiber stack (and potentially nested fibers):

https://github.com/kurocha/concurrent/blob/6315ca4da220bdffec8fd292a04150a9eacea41d/source/Concurrent/Fiber.cpp#L64-L75

Fiber::yield which swaps from fiber stack back to main stack (or potentially parent stack if nested):

https://github.com/kurocha/concurrent/blob/6315ca4da220bdffec8fd292a04150a9eacea41d/source/Concurrent/Fiber.cpp#L97-L108

cocall which is the first function executed on the stack:

https://github.com/kurocha/concurrent/blob/6315ca4da220bdffec8fd292a04150a9eacea41d/source/Concurrent/Fiber.hpp#L160-L165

So, the order is always balanced, e.g. call resume, start stack, then in cocall, finish stack, then in yield, start stack, back to resume exit stack, finish.

Is this a reasonable implementation?

I wasn't entirely sure what I should be doing with all the stack pointers/sizes, I guess that start stack should be details of the stack you are transferring to, and finish stack should be the details of the stack you came from. However, what is the purpose of fake stack and how should I handle it given that fibers can transfer in a non-nested way?

Finally, even thought this seems to work, I now get a error:

--- Concurrent::Fiber ---
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (call)
__sanitizer_start_switch_fiber (yield)
__sanitizer_finish_switch_fiber (resume)
[it should resume] 1 passed out of 1 total
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (call)
__sanitizer_start_switch_fiber (yield)
__sanitizer_finish_switch_fiber (resume)
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (yield)
__sanitizer_start_switch_fiber (yield)
__sanitizer_finish_switch_fiber (resume)
[it should yield] 2 passed out of 2 total
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (call)
__sanitizer_start_switch_fiber (yield)
__sanitizer_finish_switch_fiber (resume)
==14488==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x000000000000; bottom 0x7ffff98be000; size: 0xffff800006742000 (-140737380081664)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
[it should throw exceptions] 1 passed out of 1 total
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (call)
__sanitizer_start_switch_fiber (yield)
__sanitizer_finish_switch_fiber (resume)
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (yield)
__sanitizer_start_switch_fiber (yield)
__sanitizer_finish_switch_fiber (resume)
[it can be stopped] 4 passed out of 4 total
__sanitizer_start_switch_fiber (resume)
__sanitizer_finish_switch_fiber (call)
=================================================================
==14488==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fa0ccbfebc8 at pc 0x55baaadf659f bp 0x7fa0ccbfea20 sp 0x7fa0ccbfea18
WRITE of size 8 at 0x7fa0ccbfebc8 thread T0
    #0 0x55baaadf659e in std::exception_ptr::exception_ptr() /usr/bin/../include/c++/v1/exception:143:59
    #1 0x55baaadf659e in Concurrent::Fiber::Fiber<Concurrent::$_4::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#1}>(Concurrent::$_4::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#1}&&, unsigned long) include/Concurrent/Fiber.hpp:53
    #2 0x55baaadf5941 in Concurrent::$_4::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const concurrent/test/Concurrent/Test.Fiber.cpp:112:12
    #3 0x55baaadf43d3 in Concurrent::Coentry<Concurrent::$_4::operator()(UnitTest::Examiner&) const::{lambda()#1}>::cocall(void*) include/Concurrent/Fiber.hpp:169:4
    #4 0x55baaae94396 in coro_init concurrent/source/Concurrent/coro.c:97:3
    #5 0x7fa0cf71ed3f  (/usr/lib/libc.so.6+0x35d3f)

Address 0x7fa0ccbfebc8 is located in stack of thread T0 at offset 104 in frame
    #0 0x55baaadf55bf in Concurrent::$_4::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const concurrent/test/Concurrent/Test.Fiber.cpp:109

  This frame has 2 object(s):
    [32, 144) 'inner' <== Memory access at offset 104 is inside this variable
    [176, 184) 'ref.tmp'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /usr/bin/../include/c++/v1/exception:143:59 in std::exception_ptr::exception_ptr()
Shadow bytes around the buggy address:
  0x0ff499977d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff499977d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff499977d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff499977d50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff499977d60: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
=>0x0ff499977d70: 00 00 00 00 00 00 00 00 00[f3]f3 f3 00 00 f2 f2
  0x0ff499977d80: f2 f2 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00
  0x0ff499977d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff499977da0: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x0ff499977db0: 00 f2 f2 f2 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00
  0x0ff499977dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==14488==ABORTING

It appears as if exception_ptr of an exception thrown on another stack is not working? However, that stack is not deallocated yet, until after the exception is handled, AFAIK. I will review further but just wondering if anyone can give me feedback on my implementation.

ioquatix commented 7 years ago

Okay, I changed all fake_stack to nullptr and I no longer get any error, but I still get warning

==14488==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x000000000000; bottom 0x7ffff98be000; size: 0xffff800006742000 (-140737380081664)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189

So, I guess I'm doing something a bit wrong. I'll read documentation a bit more. Any ideas appreciated.

ioquatix commented 7 years ago

Okay, so after reading the documentation and playing around a bit, I tried implementing it as so:

#if defined(VARIANT_SANITIZE)
        void * fake_stack = nullptr;
        __sanitizer_start_switch_fiber(&fake_stack, _stack.base(), _stack.allocated_size());
        std::cerr << "__sanitizer_start_switch_fiber (resume, fake_stack=" << fake_stack << ")" << std::endl;
#endif

        coro_transfer(&_caller->_context, &_context);

#if defined(VARIANT_SANITIZE)
        std::cerr << "__sanitizer_finish_switch_fiber (resume, fake_stack=" << fake_stack << ")" << std::endl;
        __sanitizer_finish_switch_fiber(fake_stack, nullptr, nullptr);
#endif

The first (odd?) thing is that the pointer is always null? Is it using the address of the pointer to mark the stack somehow?

Secondly, I followed the instructions regarding the last __sanitizer_start_switch_fiber having nullptr as the first argument. It seems to work as expected. Still, I'm not completely confident I understand how it should be used. The test code is a bit messy, it would be nice to have a simple annotated example, not a test designed to stress test asan.

Finally, I still couldn't get the warning to go away.

blastrock commented 7 years ago

The first argument (the fake stack save) is only used when the fake stack is enabled.

To enable it, define the variable ASAN_OPTIONS=detect_stack_use_after_return=1 .

It should be unrelated to the warning you get. You get this warning when you used the API wrong, or when there is a bug in ASAN...

I am not sure (since things have changed since my patch), but I think you need to put the stack of the coroutine you come from, here https://github.com/kurocha/concurrent/blob/6315ca4da220bdffec8fd292a04150a9eacea41d/source/Concurrent/Fiber.cpp#L74

EDIT: same for yield

ioquatix commented 7 years ago

Thanks for the useful information, I will try it out and report back.

ioquatix commented 7 years ago

Okay I managed to get it to work with no warnings.

https://github.com/kurocha/concurrent/blob/663aacb14430777fc61c86c03b5eb10b8a93611c/source/Concurrent/Fiber.hpp#L110-L120

Conceptually, I had to break the functions into 4 variations.

Once I did this, I could understand and reason about how they should be called from resume, yield, transfer and so on. It makes sense now.

ioquatix commented 7 years ago

So, I got the core code working without warnings/errors, but found some issue here:

/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests

--- Async::Protocol::Buffer ---
[it can append data] 2 passed out of 2 total
[it can read data from a file] 1 passed out of 1 total
[it can read data into non-contiguous buffer] 12 passed out of 12 total
[it can read data from a file in chunks] 3 passed out of 3 total

--- Async::Notification ---
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(yield, 0x7ffef3e66000, 8388608, 0)
Fiber::finish_pop_stack(resume, 0x7f7844efe000, 4202496)
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(yield, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(coreturn, 0x7ffef3e66000, 8388608, 1)
Fiber::finish_pop_stack(resume, 0x7f7844efe000, 4202496)
[it can notify the fiber to continue] 1 passed out of 1 total

--- Async::Writable ---
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(yield, 0x7ffef3e66000, 8388608, 0)
Fiber::finish_pop_stack(resume, 0x7f7844efe000, 4202496)
Fiber::start_push_stack(resume, 0x7f78442fb000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(coreturn, 0x7ffef3e66000, 8388608, 1)
Fiber::finish_pop_stack(resume, 0x7f78442fb000, 4202496)
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(yield, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(coreturn, 0x7ffef3e66000, 8388608, 1)
Fiber::finish_pop_stack(resume, 0x7f7844efe000, 4202496)
[it can wait for writing] 1 passed out of 1 total

--- Async::Readable ---
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(yield, 0x7ffef3e66000, 8388608, 0)
Fiber::finish_pop_stack(resume, 0x7f7844efe000, 4202496)
Fiber::start_push_stack(resume, 0x7f78442fb000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(coreturn, 0x7ffef3e66000, 8388608, 1)
Fiber::finish_pop_stack(resume, 0x7f78442fb000, 4202496)
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(yield, 0x7ffef3e66000, 8388608)
Fiber::start_pop_stack(coreturn, 0x7ffef3e66000, 8388608, 1)
Fiber::finish_pop_stack(resume, 0x7f7844efe000, 4202496)
[it can wait for reading] 2 passed out of 2 total

--- Async::Job ---
Fiber::start_push_stack(resume, 0x7f7844efe000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffef3e66000, 8388608)
=================================================================
==13117==AddressSanitizer CHECK failed: /build/llvm/src/llvm-4.0.1.src/projects/compiler-rt/lib/asan/asan_thread.cc:320 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)
    #0 0x55e6f67cb527 in __asan::AsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1d0527)
    #1 0x55e6f67e7655 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1ec655)
    #2 0x55e6f67d076a in __asan::AsanThread::GetStackFrameAccessByAddr(unsigned long, __asan::AsanThread::StackFrameAccess*) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1d576a)
    #3 0x55e6f6718148 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x11d148)
    #4 0x55e6f671a8e0 in __asan::ErrorGeneric::ErrorGeneric(unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x11f8e0)
    #5 0x55e6f67cac9e in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1cfc9e)
    #6 0x55e6f67cbe5b in __asan_report_store8 (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1d0e5b)
    #7 0x55e6f688df37 in std::__1::function<void ()>::function<Async::$_0::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#1}, void>(Async::$_0::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#1}) /usr/bin/../include/c++/v1/functional:1763:7
    #8 0x55e6f688d8b3 in Async::$_0::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const /home/samuel/Documents/kurocha/async/test/Async/Test.Job.cpp:32:23
    #9 0x55e6f688c2ff in Concurrent::Coentry<Async::$_0::operator()(UnitTest::Examiner&) const::{lambda()#1}>::cocall(void*) /home/samuel/Documents/kurocha/async/test/../teapot/platforms/development/linux-sanitize/include/Concurrent/Fiber.hpp:175:4
    #10 0x55e6f69605d6 in coro_init /home/samuel/Documents/kurocha/async/teapot/packages/development/concurrent/source/Concurrent/coro.c:97:3
    #11 0x7f7847da4d3f  (/usr/lib/libc.so.6+0x35d3f)

Task #<TaskClassForAsyncTests_47339649637900:0x00561c3e102e20> failed: "Async-tests" exited with status 256
Task #<TaskClassForAsyncTests_47339649637900:0x00561c3e1c0c90> failed: Children tasks failed!
Task #<TaskClassForAsyncTests_47339649637900:0x00561c3f58c058> failed: Children tasks failed!

It's also.. a little bit odd.. in that if I only run that test, it fails a bit later:

/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests Async::Job

--- Async::Job ---
Fiber::start_push_stack(resume, 0x7fb119ef6000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffdf69a5000, 8388608)
Fiber::start_pop_stack(yield, 0x7ffdf69a5000, 8388608, 0)
Fiber::finish_pop_stack(resume, 0x7fb119ef6000, 4202496)
Fiber::start_push_stack(resume, 0x7fb119ef6000, 4202496)
Fiber::finish_push_stack(yield, 0x7ffdf69a5000, 8388608)
Fiber::start_pop_stack(coreturn, 0x7ffdf69a5000, 8388608, 1)
Fiber::finish_pop_stack(resume, 0x7fb119ef6000, 4202496)
[it can wait for result] 1 passed out of 1 total
Fiber::start_push_stack(resume, 0x7fb119ef6000, 4202496)
Fiber::finish_push_stack(cocall, 0x7ffdf69a5000, 8388608)
=================================================================
==13215==AddressSanitizer CHECK failed: /build/llvm/src/llvm-4.0.1.src/projects/compiler-rt/lib/asan/asan_thread.cc:320 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)
    #0 0x55c23ae6d527 in __asan::AsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1d0527)
    #1 0x55c23ae89655 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1ec655)
    #2 0x55c23ae7276a in __asan::AsanThread::GetStackFrameAccessByAddr(unsigned long, __asan::AsanThread::StackFrameAccess*) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1d576a)
    #3 0x55c23adba148 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x11d148)
    #4 0x55c23adbc8e0 in __asan::ErrorGeneric::ErrorGeneric(unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x11f8e0)
    #5 0x55c23ae6cc9e in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1cfc9e)
    #6 0x55c23ae6dcab in __asan_report_store1 (/home/samuel/Documents/kurocha/async/teapot/platforms/development/linux-sanitize/test/Async-tests+0x1d0cab)
    #7 0x55c23af4bb06 in UnitTest::Expectation<UnitTest::Examiner, Async::$_1::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#2}>::Expectation(UnitTest::Examiner&, {lambda()#1} const&, bool) /home/samuel/Documents/kurocha/async/test/../teapot/platforms/development/linux-sanitize/include/UnitTest/Expectation.hpp:22:120
    #8 0x55c23af43ea4 in UnitTest::Expectation<UnitTest::Examiner, Async::$_1::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#2}> UnitTest::Examiner::expect<Async::$_1::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#2}>(Async::$_1::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const::{lambda()#2} const&) /home/samuel/Documents/kurocha/async/test/../teapot/platforms/development/linux-sanitize/include/UnitTest/UnitTest.hpp:53:11
    #9 0x55c23af428cd in Async::$_1::operator()(UnitTest::Examiner&) const::{lambda()#1}::operator()() const /home/samuel/Documents/kurocha/async/test/Async/Test.Job.cpp:63:15
    #10 0x55c23af4102f in Concurrent::Coentry<Async::$_1::operator()(UnitTest::Examiner&) const::{lambda()#1}>::cocall(void*) /home/samuel/Documents/kurocha/async/test/../teapot/platforms/development/linux-sanitize/include/Concurrent/Fiber.hpp:175:4
    #11 0x55c23b0025d6 in coro_init /home/samuel/Documents/kurocha/async/teapot/packages/development/concurrent/source/Concurrent/coro.c:97:3
    #12 0x7fb120e3ed3f  (/usr/lib/libc.so.6+0x35d3f)

Task #<TaskClassForAsyncTests_47187663006060:0x0055d5794d4ff0> failed: "Async-tests" exited with status 256
Task #<TaskClassForAsyncTests_47187663006060:0x0055d5792c1ab0> failed: Children tasks failed!
Task #<TaskClassForAsyncTests_47187663006060:0x0055d5792c2dc0> failed: Children tasks failed!

These tests check that a fiber adding a job to a thread pool works as expected. The tests pass without sanity checks.

The only thing I can think of, is that between tests sometimes stacks are allocated at the same address. Perhaps there is something left over from a previous invocation that's causing it to fail?

ioquatix commented 7 years ago

So, I checked, and individually the tests work fine.

ioquatix commented 7 years ago

Okay, I updated from clang 4.x to 5.x and the problem is... gone.

morehouse commented 6 years ago

@kcc: Seems that we have a workaround now with fiber annotations. Can we close this?

kcc commented 6 years ago

let's close. If anyone sees a remaining problem, please open a new bug with new details.

stsp commented 2 years ago

Hi guys.

Should the custom swapcontext() be somehow annotated to asan? I've got asan working by using the glibc's swapcontext() and __sanitizer_start_switch_fiber __sanitizer_finish_switch_fiber annotations. But when using the custom, asm-written swapcontext()-alike function, I can't get things to work even with the same switch_fiber annotations. So should I somehow also annotate the custom swapcontext function?

It crashes in a function epilogue that looks like this:

0x00005555560f49af <+1191>: je     0x5555560f49d2 <co_switch_context+1226>
   0x00005555560f49b1 <+1193>:  movq   $0x45e0360e,(%rbx)
   0x00005555560f49b8 <+1200>:  movabs $0xf5f5f5f5f5f5f5f5,%rax
   0x00005555560f49c2 <+1210>:  mov    %rax,0x7fff8000(%r14)
   0x00005555560f49c9 <+1217>:  mov    0x38(%rbx),%rax
=> 0x00005555560f49cd <+1221>:  movb   $0x0,(%rax)

rax==0 here. I don't understand what does this epilog code do and why it crashes only with the custom swapcontext().

stsp commented 2 years ago

https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_interceptors.cpp#L243 Obviously asan intercepts swapcontext() and another *context functions. So seems like there is no way to use the custom swapcontext() with asan?

ioquatix commented 2 years ago

Just for reference, this is how I implemented it: https://github.com/kurocha/concurrent/blob/6eee988ba7263f017a8d74560afde2f0396c1370/source/Concurrent/Fiber.cpp#L46-L70

felixguendling commented 2 years ago

Asan support is working for us like this: https://github.com/motis-project/ctx/blob/master/include/ctx/impl/operation.h#L51-L86

We're using this in combination with deboost.context.

stsp commented 2 years ago

Thanks, deboost.context indeed looks like using its own asm for context switching, and yet it works for you with asan with just a basic *_switch_fiber() annotations... Interesting.

As for "concurrent" project mentioned by @ioquatix - I can't find the custom context switching primitives there.

stsp commented 2 years ago

https://github.com/septag/deboost.context/blob/master/asm/jump_x86_64_ms_pe_gas.asm#L164

        movq  %gs:(0x30), %r10
        /* restore fiber local storage */
        movq  0xb0(%rsp), %rax
        movq  %rax, 0x20(%r10)
        /* restore current deallocation stack */
        movq  0xb8(%rsp), %rax
        movq  %rax, 0x1478(%r10)
        /* restore current stack limit */
        movq  0xc0(%rsp), %rax
        movq  %rax, 0x10(%r10)
        /* restore current stack base */
        movq  0xc8(%rsp), %rax
        movq  %rax, 0x08(%r10)

@felixguendling - is this code snip written specifically for asan? Or some other purpose?

ioquatix commented 2 years ago

https://github.com/orgs/kurocha/repositories?q=coroutine&type=all&language=&sort= for all native implementations.

stsp commented 2 years ago

Thanks! Its very simplistic: https://github.com/kurocha/coroutine-amd64/blob/master/source/Coroutine/Context.s Just pushes a few regs on stack. And yet it works with asan... Then perhaps I need to find a problem in the context-switching code I took from libtask...

ioquatix commented 2 years ago

Coroutine transfer is a simple operation, it's a function call and return with a stack swap in the middle. Any implementation that makes it more complicated than that is wrong. IMHO :)

stsp commented 2 years ago

You are right. :) And still some guys (like myself) can shoot their feet even here. I had a Cish wrapper around asm getcontext, and it wasn't marked with always_inline attribute. As the result, it was saving its own stack frame to the context struct... Your example, being that simple, helped me to realize the stupidity. I wonder why it never broke w/o asan...

kyrieSun-wow commented 1 year ago

Hi everyone, @ioquatix I met an issue when I tried to enable Asan in the c program using swapcontext function, are you free to provide some advice?

The following are the details: Build server: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

With the help of _sanitizer_start_switch_fiber/sanitizer_finish_switchfiber, even if I turned on the Asan, my program worked fine in jumpping from one to another coroutine(like main thread to coroutine func2, coroutine func2 to coroutine func1), but as soon as I try to restore the old coroutine by jumping back to it, Asan stops working(stack-buffer-overflow can not be detected).

And I noticed that the argument _func2_threadstack below got nothing all the time:

    void *func2_thread_stack = NULL;
    __sanitizer_start_switch_fiber(&func2_thread_stack,uctx_func1.uc_stack.ss_sp,uctx_func1.uc_stack.ss_size);
    ……… (swap from func2 to func1, and then func1 swap back func2)
    __sanitizer_finish_switch_fiber(func2_thread_stack, &from_stack, &from_stacksize);

Could you give me some advice?

Sincerely,

idleroamer commented 1 year ago

I need to share my experience with boost-asio and asan briefly for all other fellows that are me three days ago.

If this is your problem: "You need to make asan and boost-asio with coroutines get along" then your are in for a treat.

I put some pieces together but it is essentially no-brainer, boost-asio does not have the incentives to move to coroutine2 https://github.com/chriskohlhoff/asio/issues/603 but thanks to https://github.com/cbodley/spawn (a stand- alone header-only library of the latter PR) it can work. After switching all boost::asio::spawn to spawn::spawn you are through the tedious parts.

Obviously you need to build boost with asan support as mentioned here https://github.com/boostorg/coroutine/issues/30#issuecomment-325583085

context-impl=ucontext -DBOOST_USE_ASAN -DBOOST_USE_UCONTEXT

And simplest part build your project with -DBOOST_USE_ASAN DBOOST_USE_UCONTEXT flags. voila

kyrieSun-wow commented 1 year ago

Thanks to @ioquatix , the compatible issue between ASAN and swapcontext is solved in my program. He was very friendly and very patient, and he taught me a lot about asan. I have compiled some of the lessons he taught me so that more people like me can learn from it:

Compatible issue between ASAN and swapcontext() 1.Phenomenon ASAN does not fully support swapcontext technology, as asan has indicated in log: ==1000==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Under this constraint, if function swapcontext() is introduced in your program, there will be some false positives reported after coroutine was changed.  The detection capability of ASAN is almost ineffective, and even seriously affects the normal operation of the program.

2.Mechanism of asan To solve this problem, we need to understand why these false positives occur.

And to understand why these false positives occur, we need to learn how asan works: ASAN needs to allocate and store a shadow stack for each fiber, to track usage. You should also poison the stack when it’s no longer in use (e.g. if you track a high water mark, or completely free it).

3.Way to make swapcontext() compatible with asan Note: The flag 'ASAN_OPTIONS=detect_stack_use_after_return=true' is necessary when the swapcontext() function is used on your program.

Therefore, we need to find a way to notify ASAN before/after we exchange the fiber.

To make things easier, I recommend adding fake_stack pointer for every fiber when ASAN is enabled.

For this fake_stack:

And when we try to jump to new(target) coroutine by executing swapcontext(), we need to store the fake_stack of old(current) fiber, so that when we try to return to the old fiber, we can restore the stack of old fiber with the fake_stack we ever stored before. 

Here introduce two function provided by ASAN to manage the fake_stack:

// Fiber annotation interface.
// Before switching to a different stack, one must call
// __sanitizer_start_switch_fiber with a pointer to the bottom of the
// destination stack and its size. When code starts running on the new stack,
// it must call __sanitizer_finish_switch_fiber to finalize the switch.
// The start_switch function takes a void** to store the current fake stack if
// there is one (it is needed when detect_stack_use_after_return is enabled).
// When restoring a stack, this pointer must be given to the finish_switch
// function. In most cases, this void* can be stored on the stack just before
// switching. When leaving a fiber definitely, null must be passed as first
// argument to the start_switch function so that the fake stack is destroyed.
// If you do not want support for stack use-after-return detection, you can
// always pass null to these two functions.
// Note that the fake stack mechanism is disabled during fiber switch, so if a
// signal callback runs during the switch, it will not benefit from the stack
// use-after-return detection.
void __sanitizer_start_switch_fiber(void **fake_stack_save,
                                    const void *bottom, size_t size);

void __sanitizer_finish_switch_fiber(void *fake_stack_save,
                                     const void **bottom_old,
                                     size_t *size_old);

The implementation of these two function is in here: https://github.com/llvm/llvm-project/blob/a2ef44a5d65932c7bb0f483217826856325b60df/compiler-rt/lib/asan/asan_thread.cpp#L526-L551

From the source code, we can see that, __sanitizer_start_switch_fiber will assign the fake_stack IF and ONLY IF you provide a pointer.

This is how I handle swapcontext() issue:

//vthctx: Context of main fiber/coroutine
//vth:  Context of fiber/coroutine A 

Step1: Try to exchange from main fiber to fiber A =========================================================================:
//On the main fiber.

//Argument0: The container for asan to allocate the fake_stack for current fiber.
//           - If we want the current fiber to stay still(we are going to jump back later),then one valid pointer(&vthctx->fake_stack here) shall be passed to argument 0 to store the fake_stack of current fiber;
//           - If we don't want to keep the current fiber alive(we won't jump back), 'NULL' shall be passed to argument 0 to notify asan to delete the fake_stack of current fiber.
//Argument1: The info of target fiber we are going to jump to.
//Argument2: The info of target fiber we are going to jump to.
__sanitizer_start_switch_fiber(&vthctx->fake_stack, vth->uctx.uc_stack.ss_sp, vth->uctx.uc_stack.ss_size);

//exchange to target fiber A. 
swapcontext(&vthctx->tmp_outer_uctx, &vth->uctx);

Step2: On the trigger function of fiber A =========================================================================:
//On the fiber A
const void *from_stack;
size_t from_stacksize;

//Argument0: We are the first time to jump into this fiber, so NULL shall be set as argument 0;
//           - Set argument 0 to 'NULL' means that we have no historical stack to restore for this fiber;
//           - If we have been to this fiber and have historical stack to restore for this fiber, then set the historical stack to argument 0.  
//Argument1: The container for asan to return the info of old fiber we were in before we jumped over.
//Argument2: The container for asan to return the info of old fiber we were in before we jumped over.
__sanitizer_finish_switch_fiber(NULL, &from_stack, &from_stacksize);

 Step3: jump back from fiber A to main fiber=========================================================================:
//Argument0: To store the fake_stack of old fiber before jumping out.
//           - Pass 'NULL' to argument 0 if we won't jump back to fiber A, then asan will delete the fake_stack of fiber A for us.
//           - Pass '&vth->fake_stack'to argument 0 if we plan to keep fiber A alive and we will jump back in the future,and asan will keep the fake_stack of fiber A for us.
//Argument1: The info of target fiber we are going to jump to.
//Argument2: The info of target fiber we are going to jump to.
__sanitizer_start_switch_fiber(NULL, vthctx->tmp_outer_uctx.uc_stack.ss_sp, vthctx->tmp_outer_uctx.uc_stack.ss_size);

//exchange to main fiber.  
swapcontext(&vth->uctx, &vthctx->tmp_outer_uctx);

Step4: Restore the fake_stack on main fiber =========================================================================:
//At the point of the main fiber we're jumping back to

//Argument0: The fake_stack sotred before(see Step1),also the one we try to restore for this fiber.
//Argument1: The container for asan to return the info of old fiber we were in before we jumped over.
//Argument2: The container for asan to return the info of old fiber we were in before we jumped over.
__sanitizer_finish_switch_fiber(vthctx->fake_stack, &from_stack, &from_stacksize);

ASAN only cares about tracking the stack swapping, so as long as you wrap the stack exchange operation (coroutine transfer) correctly, ASAN should work well with swapcontext().

ioquatix commented 1 year ago

I wonder if the above comment can be added to the Wiki?

https://github.com/google/sanitizers/wiki

javeme commented 5 months ago

We also encountered the problem: libasan hangs in pthread_create() and never returns (it sometimes hangs, but not always). image

Stack trace with symbols:

Thread 2 (LWP 1236467):
#0  __sanitizer::atomic_exchange<__sanitizer::atomic_uint32_t> (mo=__sanitizer::memory_order_acquire, v=2, a=0x640000001b00)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h:61
#1  __sanitizer::BlockingMutex::Lock (this=this@entry=0x640000001b00) at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_linux.cc:618
#2  0x00007f0c9958f37d in __sanitizer::GenericScopedLock<__sanitizer::BlockingMutex>::GenericScopedLock (mu=0x640000001b00, this=<synthetic pointer>)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_mutex.h:183
#3  __sanitizer::SizeClassAllocator64<__asan::AP64>::GetFromAllocator (this=this@entry=0x7f0c996c7e40 <__asan::instance>, stat=stat@entry=0x7f0c7c67bc40, class_id=class_id@entry=36, 
    chunks=chunks@entry=0x7f0c7c677330, n_chunks=n_chunks@entry=8)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_allocator_primary64.h:126
#4  0x00007f0c9958f4ac in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64> >::Refill (this=this@entry=0x7f0c7c66e0e0, c=c@entry=0x7f0c7c677320, 
    allocator=allocator@entry=0x7f0c996c7e40 <__asan::instance>, class_id=class_id@entry=36)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_allocator_local_cache.h:105
#5  0x00007f0c99593a78 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64> >::Allocate (class_id=36, allocator=0x7f0c996c7e40 <__asan::instance>, 
    this=0x7f0c7c66e0e0) at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_common.h:439
#6  __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64>, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64> >, __sanitizer::LargeMmapAllocator<__asan::AsanMapUnmapCallback, __sanitizer::ReturnNullOrDieOnFailure> >::Allocate (alignment=1, size=8192, cache=0x7f0c7c66e0e0, this=0x7f0c996c7e40 <__asan::instance>)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_allocator_combined.h:60
#7  __asan::QuarantineCallback::Allocate (size=8192, this=<synthetic pointer>) at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_allocator.cc:163
#8  __sanitizer::QuarantineCache<__asan::QuarantineCallback>::Enqueue (size=32, ptr=0x60300013a7f0, cb=..., this=0x7f0c7c66e060)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_quarantine.h:212
#9  __sanitizer::Quarantine<__asan::QuarantineCallback, __asan::AsanChunk>::Put (size=32, ptr=0x60300013a7f0, cb=..., c=0x7f0c7c66e060, this=0x7f0c998c80b8 <__asan::instance+2097784>)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_quarantine.h:102
#10 __asan::Allocator::QuarantineChunk (stack=0x60300013a800, ptr=0x60300013a800, m=0x60300013a7f0, this=0x7f0c996c7e40 <__asan::instance>)
    at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_allocator.cc:564
#11 __asan::Allocator::Deallocate (this=this@entry=0x7f0c996c7e40 <__asan::instance>, ptr=ptr@entry=0x60300013a800, delete_size=delete_size@entry=0, stack=stack@entry=0x7f0c7ce83c70, 
    alloc_type=alloc_type@entry=__asan::FROM_MALLOC) at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_allocator.cc:609
#12 0x00007f0c9958e657 in __asan::asan_free (ptr=ptr@entry=0x60300013a800, stack=stack@entry=0x7f0c7ce83c70, alloc_type=alloc_type@entry=__asan::FROM_MALLOC)
    at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_allocator.cc:803
#13 0x00007f0c9964ffdb in __interceptor_free (ptr=0x60300013a800) at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:69
#14 0x00007f0c98d254bd in __pthread_attr_destroy (attr=attr@entry=0x7f0c7ce84510) at pthread_attr_destroy.c:38
#15 0x00007f0c9966c892 in __sanitizer::GetThreadStackTopAndBottom (at_initialization=at_initialization@entry=false, stack_top=stack_top@entry=0x7f0c7ce845a0, 
    stack_bottom=stack_bottom@entry=0x7f0c7ce845a8) at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc:110
#16 0x00007f0c9966cbf3 in __sanitizer::GetThreadStackAndTls (main=<optimized out>, stk_addr=stk_addr@entry=0x7f0c7c66e020, stk_size=stk_size@entry=0x7f0c7ce845f8, 
    tls_addr=tls_addr@entry=0x7f0c7c66e040, tls_size=tls_size@entry=0x7f0c7ce845f0)
    at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc:415
#17 0x00007f0c9965e7bf in __asan::AsanThread::SetThreadStackAndTls (this=this@entry=0x7f0c7c66e000, options=<optimized out>)
    at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_thread.h:80
#18 0x00007f0c9965ea31 in __asan::AsanThread::Init (this=this@entry=0x7f0c7c66e000, options=options@entry=0x0)
    at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_thread.cc:224
#19 0x00007f0c9965ee34 in __asan::AsanThread::ThreadStart (this=0x7f0c7c66e000, os_id=1236467, signal_thread_is_registered=0x7f0c830bfda8)
    at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_thread.cc:241
#20 0x00007f0c98d23c79 in start_thread (arg=0x7f0c7ce8a700) at pthread_create.c:486
#21 0x00007f0c986d7a4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (LWP 1236463):
#0  __sanitizer::internal_sched_yield () at /gcc8_x86_64/src/gcc/libsanitizer/sanitizer_common/sanitizer_syscall_linux_x86_64.inc:18
#1  0x00007f0c995b7f45 in __interceptor_pthread_create (thread=thread@entry=0x60f0000000e0, attr=<optimized out>, attr@entry=0x0, 
    start_routine=start_routine@entry=0x3502c00 <bvar::detail::SamplerCollector::sampling_thread(void*)>, arg=arg@entry=0x60f000000040)
    at /gcc8_x86_64/src/gcc/libsanitizer/asan/asan_interceptors.cc:242
#2  0x00000000035021d3 in bvar::detail::SamplerCollector::create_sampling_thread (this=0x60f000000040)
    at code/third_party/submodule/brpc/src/bvar/detail/sampler.cpp:104
#3  bvar::detail::SamplerCollector::after_forked_as_child (this=0x60f000000040) at code/third_party/submodule/brpc/src/bvar/detail/sampler.cpp:104
#4  bvar::detail::SamplerCollector::child_callback_atfork () at code/third_party/submodule/brpc/src/bvar/detail/sampler.cpp:86
#5  0x00007f0c986e4cf8 in __run_fork_handlers (who=who@entry=atfork_run_child) at register-atfork.c:134
#6  0x00007f0c986a672d in __libc_fork () at ../sysdeps/nptl/fork.c:137
#7  0x00007f0c98652824 in _IO_new_proc_open (fp=fp@entry=0x6110000c1100, command=command@entry=0x603000131380 "/usr/bin/chronyc tracking 2>&1", mode=<optimized out>, 
    mode@entry=0x4414e60 "r") at iopopen.c:122
#8  0x00007f0c98652ac8 in _IO_new_popen (command=0x603000131380 "/usr/bin/chronyc tracking 2>&1", mode=0x4414e60 "r") at iopopen.c:203
#9  0x0000000002e24af2 in bytebase::common::CommandRunner::Exec (command=...) at code/src/common/command_runner.cc:13
#18 0x0000000002d5aa2f in make_pcontext ()
#19 0x0000000000000000 in ?? ()

Looks similar to this issue: https://github.com/google/sanitizers/issues/945

stsp commented 5 months ago

It seems like these days detect_stack_use_after_return breaks fiber switching. When detect_stack_use_after_return is enabled, asan malloc's the "fake" stack and puts the locals there together with redzones. That process is (partially) documented here: https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn That very same stack ptr is put into the first argument of __sanitizer_start_switch_fiber(). If detect_stack_use_after_return is disabled, then the "fake stack" machinery is not used, so __sanitizer_start_switch_fiber() always puts NULL into its first arg.

Now the problem is, the fake-stack is per-thread, not per-fiber. When some fiber exits, we put NULL into the first arg of __sanitizer_start_switch_fiber(), and that unmaps the entire per-thread fake-stack: https://gnu.googlesource.com/gcc/+/refs/heads/trunk/libsanitizer/asan/asan_thread.cpp#166 Which, as noted above, contains current locals and redzones. So all crashes.

Probably __sanitizer_start_switch_fiber() should allocate and free its own fake stacks, and not touch the per-thread one?

stsp commented 5 months ago

I opened #1760 for that problem.