wolfcw / libfaketime

libfaketime modifies the system time for a single application
https://github.com/wolfcw/libfaketime
GNU General Public License v2.0
2.71k stars 325 forks source link

deadlock when address sanitizer is used under clang since v0.9.7 #365

Closed LtdSauce closed 2 years ago

LtdSauce commented 2 years ago

Hi!

we use AddressSanitizer to test our executables. Recently we wanted to decrease time-of-test and tried libfaketime. We then noticed that programs compiled with clang and with address sanitizer freezes with the following message:

libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative.

When we don't compile them with address sanitizer everything works as expected. Furthermore, when compiled with gcc and address sanitizer everything works when we load libasan.so before libfaketime.so. Gcc links it dynamically by default while clang links it statically. We want to avoid using the non-default linkage in clang and keep it linked statically.

I have not found another issue related to address sanitizers.

Environment

OS: Ubuntu Impish 21.10 x86_64 clang: Ubuntu clang version 13.0.0-2 libfaketime: version 0.9.8-9 from apt (and also when build from https://github.com/wolfcw/libfaketime/commit/f26242b655daf672a2e83b6ce669d8dec5f392ff)

When linked against 0.9.6 everything seems to work.

Steps to reproduce

To reproduce this issue we created a little program (or just a simple main):

//main.cpp/main.c
int main() {
}

When compiled with the following:

// C
clang -fsanitize=address main.c
// C++
clang++ -fsanitize=address main.cpp

Running under gdb produces the following stacktrace:

Backtrace for `LD_PRELOAD=libfaketime.so.1 FAKETIME="-15d" gdb --args ./a.out` ``` GNU gdb (Ubuntu 11.1-0ubuntu2) 11.1 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./main.gcc... (No debugging symbols found in ./main.gcc) (gdb) run Starting program: /home/lucas/Develop/faketime-sanitizer-tests/main.gcc [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". ^C Program received signal SIGINT, Interrupt. 0x00000000004af0e0 in __sanitizer::BlockingMutex::Lock() () (gdb) bt #0 0x00000000004af0e0 in __sanitizer::BlockingMutex::Lock() () #1 0x0000000000421cea in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) () #2 0x0000000000421c03 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) () #3 0x00000000004218b7 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, unsigned long, unsigned long) () #4 0x000000000041d46e in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) () #5 0x000000000041d214 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) () #6 0x0000000000498a9c in malloc () #7 0x00007ffff7fe12ca in malloc (size=74) at ../include/rtld-malloc.h:56 #8 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffca70, objname=0x7ffff7ffef30 "../libfaketime/libfaketime/src/libfaketime.so.1", fmt=fmt@entry=0x7ffff7ff2e1b "undefined symbol: %s%s%s") at dl-exception.c:157 #9 0x00007ffff7fd4762 in _dl_lookup_symbol_x (undef_name=0x7ffff7fb6a56 "__ftime", undef_map=0x7ffff7fc0000, ref=0x7fffffffcb18, symbol_scope=0x7ffff7ffe5c0, version=0x0, type_class=0, flags=0, skip_map=0x7ffff7fc0000) at dl-lookup.c:872 #10 0x00007ffff7de1e14 in do_sym (handle=, name=0x7ffff7fb6a56 "__ftime", who=0x7ffff7fb1aba , vers=vers@entry=0x0, flags=flags@entry=2) at dl-sym.c:146 #11 0x00007ffff7de21b1 in _dl_sym (handle=, name=, who=) at dl-sym.c:195 #12 0x00007ffff7cfe628 in dlsym_doit (a=a@entry=0x7fffffffcd90) at dlsym.c:40 #13 0x00007ffff7de23b8 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffccf0, operate=0x7ffff7cfe610 , args=0x7fffffffcd90) at dl-error-skeleton.c:208 #14 0x00007ffff7de2483 in __GI__dl_catch_error (objname=0x7fffffffcd48, errstring=0x7fffffffcd50, mallocedp=0x7fffffffcd47, operate=, args=) at dl-error-skeleton.c:227 #15 0x00007ffff7cfe02e in _dlerror_run (operate=operate@entry=0x7ffff7cfe610 , args=args@entry=0x7fffffffcd90) at dlerror.c:138 #16 0x00007ffff7cfe69e in dlsym_implementation (dl_caller=, name=, handle=) at dlsym.c:54 #17 ___dlsym (handle=, name=) at dlsym.c:68 #18 0x00007ffff7fb1aba in ftpl_init () from ../libfaketime/libfaketime/src/libfaketime.so.1 #19 0x00007ffff7fb0d48 in clock_gettime () from ../libfaketime/libfaketime/src/libfaketime.so.1 #20 0x00000000004b6b8b in __sanitizer::MonotonicNanoTime() () #21 0x00000000004222f5 in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::PopulateFreeArray(__sanitizer::AllocatorStats*, unsigned long, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::RegionInfo*, unsigned long) () #22 0x0000000000421efa in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) () #23 0x0000000000421c03 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) () #24 0x00000000004218b7 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, unsigned long, unsigned long) () #25 0x000000000041d46e in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) () #26 0x000000000041d214 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) () #27 0x0000000000498a9c in malloc () #28 0x00007ffff7fe12ca in malloc (size=74) at ../include/rtld-malloc.h:56 #29 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffd9f0, objname=0x7ffff7ffef30 "../libfaketime/libfaketime/src/libfaketime.so.1", fmt=fmt@entry=0x7ffff7ff2e1b "undefined symbol: %s%s%s") at dl-exception.c:157 #30 0x00007ffff7fd4762 in _dl_lookup_symbol_x (undef_name=0x7ffff7fb6a56 "__ftime", undef_map=0x7ffff7fc0000, ref=0x7fffffffda98, symbol_scope=0x7ffff7ffe5c0, version=0x0, type_class=0, flags=0, skip_map=0x7ffff7fc0000) at dl-lookup.c:872 #31 0x00007ffff7de1e14 in do_sym (handle=, name=0x7ffff7fb6a56 "__ftime", who=0x7ffff7fb1aba , vers=vers@entry=0x0, flags=flags@entry=2) at dl-sym.c:146 #32 0x00007ffff7de21b1 in _dl_sym (handle=, name=, who=) at dl-sym.c:195 #33 0x00007ffff7cfe628 in dlsym_doit (a=a@entry=0x7fffffffdd10) at dlsym.c:40 #34 0x00007ffff7de23b8 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffdc70, operate=0x7ffff7cfe610 , args=0x7fffffffdd10) at dl-error-skeleton.c:208 #35 0x00007ffff7de2483 in __GI__dl_catch_error (objname=0x7fffffffdcc8, errstring=0x7fffffffdcd0, mallocedp=0x7fffffffdcc7, operate=, args=) at dl-error-skeleton.c:227 #36 0x00007ffff7cfe02e in _dlerror_run (operate=operate@entry=0x7ffff7cfe610 , args=args@entry=0x7fffffffdd10) at dlerror.c:138 #37 0x00007ffff7cfe69e in dlsym_implementation (dl_caller=, name=, handle=) at dlsym.c:54 #38 ___dlsym (handle=, name=) at dlsym.c:68 #39 0x00007ffff7fb1aba in ftpl_init () from ../libfaketime/libfaketime/src/libfaketime.so.1 #40 0x00007ffff7fd9b1e in call_init (l=, argc=argc@entry=1, argv=argv@entry=0x7fffffffded8, env=env@entry=0x7fffffffdee8) at dl-init.c:70 #41 0x00007ffff7fd9c08 in call_init (env=0x7fffffffdee8, argv=0x7fffffffded8, argc=1, l=) at dl-init.c:33 #42 _dl_init (main_map=0x7ffff7ffe220, argc=1, argv=0x7fffffffded8, env=0x7fffffffdee8) at dl-init.c:117 #43 0x00007ffff7fc910a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #44 0x0000000000000001 in ?? () #45 0x00007fffffffe237 in ?? () #46 0x0000000000000000 in ?? () ```
Backtrace when compiled with clang++ ``` GNU gdb (Ubuntu 11.1-0ubuntu2) 11.1 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from a.out... (No debugging symbols found in a.out) (gdb) run Starting program: /home/lucas/Develop/faketime-sanitizer-tests/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative. ^C Program received signal SIGINT, Interrupt. 0x00000000004af160 in __sanitizer::BlockingMutex::Lock() () (gdb) bt #0 0x00000000004af160 in __sanitizer::BlockingMutex::Lock() () #1 0x0000000000421d6a in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) () #2 0x0000000000421c83 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) () #3 0x0000000000421937 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, unsigned long, unsigned long) () #4 0x000000000041d4ee in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) () #5 0x000000000041d294 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) () #6 0x0000000000498b1c in malloc () #7 0x00007ffff7fe12ca in malloc (size=74) at ../include/rtld-malloc.h:56 #8 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffc070, objname=0x7ffff7ffef30 "../libfaketime/libfaketime/src/libfaketime.so.1", fmt=fmt@entry=0x7ffff7ff2e1b "undefined symbol: %s%s%s") at dl-exception.c:157 #9 0x00007ffff7fd4762 in _dl_lookup_symbol_x (undef_name=0x7ffff7fb6a56 "__ftime", undef_map=0x7ffff7fc0000, ref=0x7fffffffc118, symbol_scope=0x7ffff7ffe5c0, version=0x0, type_class=0, flags=0, skip_map=0x7ffff7fc0000) at dl-lookup.c:872 #10 0x00007ffff7bc8e14 in do_sym (handle=, name=0x7ffff7fb6a56 "__ftime", who=0x7ffff7fb1aba , vers=vers@entry=0x0, flags=flags@entry=2) at dl-sym.c:146 #11 0x00007ffff7bc91b1 in _dl_sym (handle=, name=, who=) at dl-sym.c:195 #12 0x00007ffff7ae5628 in dlsym_doit (a=a@entry=0x7fffffffc390) at dlsym.c:40 #13 0x00007ffff7bc93b8 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffc2f0, operate=0x7ffff7ae5610 , args=0x7fffffffc390) at dl-error-skeleton.c:208 #14 0x00007ffff7bc9483 in __GI__dl_catch_error (objname=0x7fffffffc348, errstring=0x7fffffffc350, mallocedp=0x7fffffffc347, operate=, args=) at dl-error-skeleton.c:227 #15 0x00007ffff7ae502e in _dlerror_run (operate=operate@entry=0x7ffff7ae5610 , args=args@entry=0x7fffffffc390) at dlerror.c:138 #16 0x00007ffff7ae569e in dlsym_implementation (dl_caller=, name=, handle=) at dlsym.c:54 #17 ___dlsym (handle=, name=) at dlsym.c:68 #18 0x00007ffff7fb1aba in ftpl_init () from ../libfaketime/libfaketime/src/libfaketime.so.1 #19 0x00007ffff7fb0c7f in clock_gettime () from ../libfaketime/libfaketime/src/libfaketime.so.1 #20 0x00000000004b6c0b in __sanitizer::MonotonicNanoTime() () #21 0x0000000000422375 in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::PopulateFreeArray(__sanitizer::AllocatorStats*, unsigned long, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::RegionInfo*, unsigned long) () #22 0x0000000000421f7a in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) () #23 0x0000000000421c83 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) () #24 0x0000000000421937 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, unsigned long, unsigned long) () #25 0x000000000041d4ee in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) () #26 0x000000000041d294 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) () #27 0x0000000000498b1c in malloc () #28 0x00007ffff7fe12ca in malloc (size=74) at ../include/rtld-malloc.h:56 #29 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffcff0, objname=0x7ffff7ffef30 "../libfaketime/libfaketime/src/libfaketime.so.1", fmt=fmt@entry=0x7ffff7ff2e1b "undefined symbol: %s%s%s") at dl-exception.c:157 #30 0x00007ffff7fd4762 in _dl_lookup_symbol_x (undef_name=0x7ffff7fb6a56 "__ftime", undef_map=0x7ffff7fc0000, ref=0x7fffffffd098, symbol_scope=0x7ffff7ffe5c0, version=0x0, type_class=0, flags=0, skip_map=0x7ffff7fc0000) at dl-lookup.c:872 #31 0x00007ffff7bc8e14 in do_sym (handle=, name=0x7ffff7fb6a56 "__ftime", who=0x7ffff7fb1aba , vers=vers@entry=0x0, flags=flags@entry=2) at dl-sym.c:146 #32 0x00007ffff7bc91b1 in _dl_sym (handle=, name=, who=) at dl-sym.c:195 #33 0x00007ffff7ae5628 in dlsym_doit (a=a@entry=0x7fffffffd310) at dlsym.c:40 #34 0x00007ffff7bc93b8 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd270, operate=0x7ffff7ae5610 , args=0x7fffffffd310) at dl-error-skeleton.c:208 #35 0x00007ffff7bc9483 in __GI__dl_catch_error (objname=0x7fffffffd2c8, errstring=0x7fffffffd2d0, mallocedp=0x7fffffffd2c7, operate=, args=) at dl-error-skeleton.c:227 #36 0x00007ffff7ae502e in _dlerror_run (operate=operate@entry=0x7ffff7ae5610 , args=args@entry=0x7fffffffd310) at dlerror.c:138 #37 0x00007ffff7ae569e in dlsym_implementation (dl_caller=, name=, handle=) at dlsym.c:54 #38 ___dlsym (handle=, name=) at dlsym.c:68 #39 0x00007ffff7fb1aba in ftpl_init () from ../libfaketime/libfaketime/src/libfaketime.so.1 #40 0x00007ffff7fb0d48 in clock_gettime () from ../libfaketime/libfaketime/src/libfaketime.so.1 #41 0x00000000004b6c0b in __sanitizer::MonotonicNanoTime() () #42 0x0000000000422375 in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::PopulateFreeArray(__sanitizer::AllocatorStats*, unsigned long, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::RegionInfo*, unsigned long) () #43 0x0000000000421f7a in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) () #44 0x0000000000421c83 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) () #45 0x0000000000421937 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, unsigned long, unsigned long) () #46 0x000000000041d4ee in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) () #47 0x000000000041d294 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) () #48 0x0000000000498b1c in malloc () #49 0x00007ffff7e20b7a in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #50 0x00007ffff7fd9b1e in call_init (l=, argc=argc@entry=1, argv=argv@entry=0x7fffffffded8, env=env@entry=0x7fffffffdee8) at dl-init.c:70 #51 0x00007ffff7fd9c08 in call_init (env=0x7fffffffdee8, argv=0x7fffffffded8, argc=1, l=) at dl-init.c:33 --Type for more, q to quit, c to continue without paging-- #52 _dl_init (main_map=0x7ffff7ffe220, argc=1, argv=0x7fffffffded8, env=0x7fffffffdee8) at dl-init.c:117 #53 0x00007ffff7fc910a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #54 0x0000000000000001 in ?? ( #55 0x00007fffffffe23d in ?? () #56 0x0000000000000000 in ?? () ```

EDIT: Linking the pthread-enabled .so had no effect. EDIT: linking the sanitizer runtime dynamically like gcc does it had no effect and produced the same backtrace

wolfcw commented 2 years ago

This apparently has parallels to our long-standing jemalloc issue (e.g., #130), where jemalloc depends on time-related functions calls while libfaketime depends on memory allocation first during its own initialisation. The information about clang asan's malloc on AddressSanitizerIncompatiblity does not make me very confident that there is anything we can do on libfaketime's end, but I would love to get proven wrong by someone with deeper understanding of asan's internal workings.

wolfcw commented 2 years ago

I'm leaving this open for now in case someone can bring some new ideas into this issue given that we were not able to solve the similar jemalloc incompatibility (#130) for quite some time. However, there are no solutions paths at the moment we can pursue.

psychon commented 2 years ago

Hi! I came up with a fixhack to make this "work". With this, the example from the OP prints for me:

$ clang -fsanitize=address foo.c && LD_PRELOAD=/path/to/libfaketime/src/libfaketimeMT.so.1 ./a.out
libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().
libfaketime:  Please check whether any other libraries are in use that clash with libfaketime.
libfaketime:  Returning -1 on clock_gettime() to break recursion now... if that does not work, please check other libraries' error handling.
libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().
libfaketime:  Please check whether any other libraries are in use that clash with libfaketime.
libfaketime:  Returning -1 on clock_gettime() to break recursion now... if that does not work, please check other libraries' error handling.

The hack is based on the already existing hack to detect recursive calls to clock_gettime. In my case, there was a __constructor__ call to ftpl_init() and inside that one happened a call to clock_gettime and thus another call to ftpl_init. Thus, there was no recursive call to clock_gettime and the existing detecting did not, well, detect this. Hence, I added a new variable to track whether ftpl_init() is already running.

(I also had to fix the usage of the recursion_depth counter since a decrement is missing)

diff --git a/src/libfaketime.c b/src/libfaketime.c
index f92ecf8..9223934 100644
--- a/src/libfaketime.c
+++ b/src/libfaketime.c
@@ -294,6 +294,7 @@ static bool check_missing_real(const char *name, bool missing)
   check_missing_real(#name, (NULL == real_##name))

 static int initialized = 0;
+static int initializing = 0;

 /* prototypes */
 static int    fake_gettimeofday(struct timeval *tv);
@@ -2287,7 +2288,7 @@ int clock_gettime(clockid_t clk_id, struct timespec *tp)
       fprintf(stderr, "libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative.\n");
       DONT_FAKE_TIME(ftpl_init()) ;
     }
-    else if (recursion_depth == 3)
+    else if (recursion_depth == 3 || initializing)
     {
       fprintf(stderr, "libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().\n");
       fprintf(stderr, "libfaketime:  Please check whether any other libraries are in use that clash with libfaketime.\n");
@@ -2297,6 +2298,7 @@ int clock_gettime(clockid_t clk_id, struct timespec *tp)
         tp->tv_sec = 0;
         tp->tv_nsec = 0;
       }
+      recursion_depth--;
       return -1;
     }
     else {
@@ -2557,6 +2559,7 @@ static void ftpl_init(void)
   /* moved up here from below the dlsym calls #130 */
   dont_fake = true; // Do not fake times during initialization
   dont_fake_final = false;
+  initializing = true;

 #ifdef __APPLE__
   const char *progname = getprogname();
@@ -2948,6 +2951,7 @@ static void ftpl_init(void)
   }

   dont_fake = dont_fake_final;
+  initializing = false;
 }

Doing this in the earlier branch (so that no time is faked during init) did not make the hang go away.

I'm not saying that this is pretty... but it "works" for me!

LtdSauce commented 2 years ago

@psychon and i found out that his above "hack" does not work with c++/clang++. After a little bit of fiddling he came up with another try. The following patch seems to work with clang and clang++ on our use-cases:

diff --git a/src/libfaketime.c b/src/libfaketime.c
index f92ecf8..2e8e94d 100644
--- a/src/libfaketime.c
+++ b/src/libfaketime.c
@@ -294,6 +294,7 @@ static bool check_missing_real(const char *name, bool missing)
   check_missing_real(#name, (NULL == real_##name))

 static int initialized = 0;
+static int initializing = 0;

 /* prototypes */
 static int    fake_gettimeofday(struct timeval *tv);
@@ -2282,12 +2283,12 @@ int clock_gettime(clockid_t clk_id, struct timespec *tp)
   if (!initialized)
   {
     recursion_depth++;
-    if (recursion_depth == 2)
+    if (recursion_depth == 2 && false)
     {
       fprintf(stderr, "libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative.\n");
       DONT_FAKE_TIME(ftpl_init()) ;
     }
-    else if (recursion_depth == 3)
+    else if (recursion_depth >= 2 || initializing)
     {
       fprintf(stderr, "libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().\n");
       fprintf(stderr, "libfaketime:  Please check whether any other libraries are in use that clash with libfaketime.\n");
@@ -2297,6 +2298,7 @@ int clock_gettime(clockid_t clk_id, struct timespec *tp)
         tp->tv_sec = 0;
         tp->tv_nsec = 0;
       }
+      recursion_depth--;
       return -1;
     }
     else {
@@ -2557,6 +2559,7 @@ static void ftpl_init(void)
   /* moved up here from below the dlsym calls #130 */
   dont_fake = true; // Do not fake times during initialization
   dont_fake_final = false;
+  initializing = true;

 #ifdef __APPLE__
   const char *progname = getprogname();
@@ -2948,6 +2951,7 @@ static void ftpl_init(void)
   }

   dont_fake = dont_fake_final;
+  initializing = false;
 }

For the record:

We will stick with the patch above for now. @wolfcw if you think this would make a "good-enough" fix i can file a PR from https://github.com/LtdSauce/libfaketime-asan-fixed.

wolfcw commented 2 years ago

Sounds interesting and glad it works for you so far!

However, I'm still trying to figure out how your patch works. :-) It looks like you effectively cap the recursion depth counter to 2 (based on the new recursion_depth--) and effectively replace it with a flag that is set during initialisation. It also does no longer attempt ftpl_init() wrapped in DONT_FAKE_TIME() as suggested to improve changes for jemalloc compatibility. If that fixes stuff with libasan, I guess we might simplify it a bit (leave out unreachable code) and wrap it in #ifdef for yet another compile-time flag, yes. :-)

LtdSauce commented 2 years ago

(Hopefully the following is related enough!)

We found that ASan and libfaketime have another issue. When freeing memory a deadlock in libfaketimes Code that reads in the FAKETIME_TIMESTAMP_FILE can happen. On our case it happen quite deterministic when ASans quarantine zone is full and ASan actually starts to deallocate memory.

See the following stacktrace:

futex_wait 0x00007fac60c7d160
__GI___lll_lock_wait 0x00007fac60c7d160
lll_mutex_lock_optimized 0x00007fac60c83ee2
___pthread_mutex_lock 0x00007fac60c83ee2
fake_clock_gettime 0x00007fac65f5e3d8
clock_gettime 0x00007fac65f5c5eb
__sanitizer::MonotonicNanoTime() 0x000000000057d41b
__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::MaybeReleaseToOS(__sanitizer::MemoryMapper<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, unsigned long, bool) 0x00000000004e6f08
__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::ReturnToAllocator(__sanitizer::MemoryMapper<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >*, __sanitizer::AllocatorStats*, unsigned long, unsigned int const*, unsigned long) 0x00000000004e6cb4
__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::DrainHalfMax(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) 0x00000000004e6924
__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Deallocate(__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long, void*) 0x00000000004e677e
__sanitizer::Quarantine<__asan::QuarantineCallback, __asan::AsanChunk>::DoRecycle(__sanitizer::QuarantineCache<__asan::QuarantineCallback>*, __asan::QuarantineCallback) 0x00000000004e61dd
__sanitizer::Quarantine<__asan::QuarantineCallback, __asan::AsanChunk>::Recycle(unsigned long, __asan::QuarantineCallback) 0x00000000004e5d64
__asan::Allocator::QuarantineChunk(__asan::AsanChunk*, void*, __sanitizer::BufferedStackTrace*) 0x00000000004e7e02
free 0x000000000055f0c6
_IO_deallocate_file 0x00007fac60c6adc7
_IO_new_fclose 0x00007fac60c6adc7
fclose 0x00000000005347b5
read_config_file 0x00007fac65f5e1be
fake_clock_gettime 0x00007fac65f5ea84
clock_gettime 0x00007fac65f5c5eb
clock_gettime 0x0000000000507eab
<unknown> 0x00007fac6141f705
<unknown> 0x00007fac6141ff96
mosquitto_loop 0x00007fac61416597
mosquitto_loop_forever 0x00007fac614167f3
<unknown> 0x00007fac6141f533
start_thread 0x00007fac60c80947
clone 0x00007fac60d10a44

Because ASan calls clock_gettime when deallocating memory it calls faketime again which wants to read in the file again and deadlocks. EDIT: Because ASan calls the MonotonicNanoTime i checked if building with FORCE_MONOTONIC_FIx resolves this, but it doesn't.

@psychon suggests that this could be fixed by using open instead of fopen.

If needed i could try to make a reproducible example. Probably calling malloc/free in a loop should be sufficient to eventually trigger this deadlock.

I'll check if using -shared-libsan does have this issue too. EDIT: At first glance it seems that this issue does not happen with -shared-libsan... why, we do not know

psychon commented 2 years ago

However, I'm still trying to figure out how your patch works.

Imagine the simplified model of asan's malloc hook (I actually never looked at the real code):

void *asan_malloc(size_t size) {
  pthread_mutex_lock(&some_global_mutex);
  clock_gettime(I dont know the arguments and dont care for this example);
  void *result = real_malloc(size);
  pthread_mutex_unlock(&some_global_mutex);
  return result;
}

Important here: some_global_mutex is non-recursive. Thus, any recursive calls to asan_malloc() will deadlock.

The other "ingredient" is dlsym() calling malloc().

Let's begin with the simple case:

Clang / C

libfaketime has ftpl_init() set up with __constructor, so this function is called after loading the library. ftpl_init() will now use dlsym() to find "the real" time functions. This calls malloc, which we hooked above with asan_malloc(). This now locks some_global_mutex and calls clock_gettime() while the mutex is held.

We are back in libfaketime because it hooked clock_gettime() with fake_clock_gettime() (I think I got this function name wrong... whatever). At this point, faketime notices that ftpl_init() did not yet complete and calls this function again, which calls dlsym() again, which calls malloc() which calls pthread_mutex_lock() again. However, this is a non-recursive mutex and we get a deadlock!

The callchain at this point looks something like this:

pthread_mutex_lock (this hangs because this thread already locked the mutex) was called by
asan_malloc was called by
dlsym was called by
ftpl_init was called by
clock_gettime 
asan_malloc
dlsym
ftpl_init
the dynamic loader calling the __constructor__ function of libfaketime.so

In my hack/patch above, I introduced the variable initializing to "handle" this case. The second call to clock_gettime will now notice that ftpl_init is already on the stack and thus will not call it again.

Clang++ / C++

The "new ingredient" here is that the program is linked against libstdc++.so. This shared object has some constructor that calls malloc(). I don't know why, but this constructor function is called before libfaketime's. I'll skip the details of the calls and go straight to the resulting call stack (again ordered so that the oldest call is at the end):

pthread_mutex_lock
asan_malloc
dlsym
ftpl_init
clock_gettime
asan_malloc
???
the dynamic loader calling some __constructor__ function of libstdc++.so

Here, there is not get a call to ftpl_init on the stack, so the new initializing variable does not help. The other part of my hack/patch breaks the existing recursion_depth code so that it does not call ftpl_init for "early calls" (=before ftpl_init()) of clock_gettime. That way, we again avoid the recursive call to asan_malloc.

Yet another issue

Edit: This should be fixed with #391

As @LtdSauce wrote:

We found that ASan and libfaketime have another issue

I feel like this is a separate issue, but okay, now I'll have to write down the details here.

We are using libfaketime with the environment variable set that causes it to reload the faketime file all the time (sorry, I forgot the name of this env var; I hope you know what I mean).

Here, we get again into problems with malloc. Well, actually, this time it is free calling clock_gettime (but only sometimes?). The backtrace shown above shows that some code calls clock_gettime (thus, actually fake_clock_gettime). This reloads the config file. To read the file, this uses fopen/fgets/fclose. These functions maintain an internal buffer of "data already read, but not yet passed to the caller". fclose has to free this buffer. Somehow, ASAN decides to call clock_gettime again on this call to free. This now recursively calls into libfaketime and deadlocks here: https://github.com/wolfcw/libfaketime/blob/f836ea3eb392378bc5166b3d68283f69e5330542/src/libfaketime.c#L3075

There are two calls to fake_clock_gettime on the stack. The newest one hangs in pthread_mutex_lock. That must be in the line above, since that seems to be the only such call.

The earlier call is in read_config_file -> fclose. This must be here: https://github.com/wolfcw/libfaketime/blob/f836ea3eb392378bc5166b3d68283f69e5330542/src/libfaketime.c#L3198

The above is inside the critical section which only ends here: https://github.com/wolfcw/libfaketime/blob/f836ea3eb392378bc5166b3d68283f69e5330542/src/libfaketime.c#L3282

Thus, this thread already locked this non-recursive mutex and a second pthread_mutex_lock is undefined behaviour according to POSIX, which in practice (at least on Linux) means: Deadlock / hang.

wolfcw commented 2 years ago

Thanks, I'm fine with the initializing flag, but I guess we could leave out ifs whose conditions will always be false and the recursion depth counter probably could also be simplified to a flag as it effectively can never exceed a value of 2 across function calls and has a minimum value of 1 during operation.

I'm also fine with replacing fopen and related calls with open if that avoids recursive calls due to a buffer we don't need anyway, though it might not be a long-term solution if implementation internals in glibc or libasan change (the force-monontonic-fix you mentioned is a good example for a workaround only needed for some middle versions, but not the older or the current glibc versions).