wolfcw / libfaketime

libfaketime modifies the system time for a single application
https://github.com/wolfcw/libfaketime
GNU General Public License v2.0
2.62k stars 319 forks source link

Program hangs due to mixup between monotonic clock and system clock (pthread_cond_clockwait) #469

Open psychon opened 1 month ago

psychon commented 1 month ago

Hi,

when I run the following program under libfaketime on Ubuntu 23.10 with G++ 13.2.0, it seems to just hang. Specifically, ./a.out exits after a second and faketime now ./a.out hangs.

#include <chrono>
#include <mutex>
#include <condition_variable>

int main() {
    std::mutex mutex;
    std::condition_variable cv;
    std::unique_lock<std::mutex> lock(mutex);
    const auto now = std::chrono::steady_clock::now();
    cv.wait_until(lock, now + std::chrono::seconds(1));
    return 0;
}

The hang happens both with Ubuntu's faketime as well with a version built from git.

A backtrace taken when the program hangs shows that pthread_cond_clockwait() is involved.

Looking at this with strace shows that the program is stuck in a call to futex. The tv_sec value points at the current time even though FUTEX_CLOCK_REALTIME is not used. Thus, something here seems to cause a mixup between the realtime clock and the monotonic clock, I guess.

Sorry for a bit of vagueness in some of these sentences. I do not have access to my GitHub account on my Ubuntu machine and thus couldn't easily copy&paste output.

Edit: Heh, I never tried to reproduce this on this computer. I can report: This is reproducable on Debian testing.

$ faketime --version

faketime: Version 0.9.10
For usage information please use 'faketime --help'.
$ g++ reproducer.c
$ time ./a.out
./a.out  0,00s user 0,01s system 0% cpu 1,007 total
$ time faketime now ./a.out
[no more output; this just hangs]
^C
$ date '+%s' ; faketime now strace ./a.out 
1716291918
execve("./a.out", ["./a.out"], 0x7ffd6c1b9390 /* 37 vars */) = 0
[...]
futex(0x7ffdd0780688, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1716291919, tv_nsec=574690363}, FUTEX_BITSET_MATCH_ANY
^C
$ faketime now ./a.out &
$ gdb -p $(pidof a.out)  # I actually rebuilt the program with debug symbols to get a better backtrace
(gdb) bt
#0  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7ffce4e305a8, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7ffce4e304d0, 
    private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:103
#1  0x00007f43d04a1a8b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffce4e305a8, expected=expected@entry=0, clockid=clockid@entry=1, 
    abstime=abstime@entry=0x7ffce4e304d0, private=private@entry=0) at ./nptl/futex-internal.c:139
#2  0x00007f43d04a4745 in __pthread_cond_wait_common (abstime=<optimized out>, clockid=<optimized out>, mutex=0x7ffce4e305b0, cond=0x7ffce4e30580) at ./nptl/pthread_cond_wait.c:503
#3  ___pthread_cond_clockwait64 (abstime=<optimized out>, clockid=<optimized out>, mutex=0x7ffce4e305b0, cond=0x7ffce4e30580) at ./nptl/pthread_cond_wait.c:682
#4  ___pthread_cond_clockwait64 (cond=0x7ffce4e30580, mutex=0x7ffce4e305b0, clockid=<optimized out>, abstime=<optimized out>) at ./nptl/pthread_cond_wait.c:670
#5  0x0000555b00eba431 in std::__condvar::wait_until (this=0x7ffce4e30580, __m=..., __clock=1, __abs_time=...) at /usr/include/c++/13/bits/std_mutex.h:185
#6  0x0000555b00eba7d9 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7ffce4e30580, __lock=..., 
    __atime=std::chrono::_V2::steady_clock time_point = { 1716292123259790559ns }) at /usr/include/c++/13/condition_variable:203
#7  0x0000555b00eba565 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7ffce4e30580, __lock=..., 
    __atime=std::chrono::_V2::steady_clock time_point = { 1716292123259790559ns }) at /usr/include/c++/13/condition_variable:113
#8  0x0000555b00eba24d in main () at /tmp/reproducer.c:10

Importantly, the above backtrace does not go through libfaketime.

Edit: Another data point:

$ readelf -C --dyn-syms a.out --wide | grep pthread_cond_clockwait
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND pthread_cond_clockwait@GLIBC_2.34 (3)

...and only now do I notice that libfaketime does not implement this function at all.