wolfcw / libfaketime

libfaketime modifies the system time for a single application
https://github.com/wolfcw/libfaketime
GNU General Public License v2.0
2.62k stars 319 forks source link

Faketime leads a boost::asio::deadline_timer to block #431

Open vogel76 opened 1 year ago

vogel76 commented 1 year ago

Please take a look on trivial testcase where 2 timers are scheduled to trigger. This program should continue in infinite loop, but after some time (rather random) blocks (scheduled action is not executed).

This testcase is part of bigger opensource project (Hive Blockchain, see Hive where our testing pipelines are failing randomly. As far we analyzed it, problems are similar to shown in the example.

I'd like to ask for any idea how we can track down the issue to allow you fixing the problem (this library is great and we would like to continue using it). Moreover, such problems appeared in last few months - maybe due to some system updates.

Testing example does not work on host nor docker (we of course are mostly using dockerized setup).

How to reproduce:

  1. Build: g++ -O2 ./test.cpp -lboost_system
  2. Run: LD_PRELOAD=/usr/local/lib/faketime/libfaketimeMT.so.1 FAKETIME="+0 x20" FAKETIME_DONT_RESET=1 FAKETIME_FORCE_MONOTONIC_FIX=1 ./a.out

We are using faketime in version 0.9.10. Boost library is in version 1.74. OS: Ubuntu 22.04 LTS on amd64 platform.

Could you help us to solve the issue ?

test.cpp.txt

I have tried also current head of faketime master branch - it does not works too.

wolfcw commented 1 year ago

I can't get this to block with boost 1.74 on Debian 11.7 with glibc 2.31 but the printed counter values decrease with higher speed-up factors, which is what I assume should not happen? Any chance you could backtrace this to the last boost version for which it worked?

vogel76 commented 1 year ago

Thanks for your answer. Here is a backtrace:

Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug...
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug/.build-id/27/e82301dba6c3f644404d504e1bb1c97894b433.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/61/ef896a699bb1c2e4e231642b2e1688b2f1a61e.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f09c4d99f9a in epoll_wait (epfd=6, events=0x7fffb51b8200, maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
(gdb) bt
#0  0x00007f09c4d99f9a in epoll_wait (epfd=6, events=0x7fffb51b8200, maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00007f09c50f678e in epoll_wait (epfd=6, events=0x7fffb51b8200, maxevents=128, timeout=-1) at libfaketime.c:1622
#2  0x00005653ef87290d in boost::asio::detail::epoll_reactor::run (this=0x5653f09c6570, usec=<optimized out>, ops=...) at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:471
#3  0x00005653ef873407 in boost::asio::detail::scheduler::do_run_one (this=this@entry=0x5653f09c6140, lock=..., this_thread=..., ec=...) at /usr/include/boost/asio/detail/impl/scheduler.ipp:465
#4  0x00005653ef86e201 in boost::asio::detail::scheduler::run (this=0x5653f09c6140, ec=...) at /usr/include/boost/asio/detail/impl/scheduler.ipp:204
#5  0x00005653ef86db89 in boost::asio::io_context::run (this=0x7fffb51b8990, this=0x7fffb51b8990) at /usr/include/boost/asio/impl/io_context.ipp:63
#6  main () at ./test.cpp:75
(gdb)

I suspect that some delay got screwed up and here we have a -1 passed to epoll_wait (which should block it accoring to documentation)

vogel76 commented 1 year ago

Also, here is a prebuilt CI base image (development environment) which I used to build, run and test given testcase (it is also used as base image for testing Pipelines): registry.gitlab.syncad.com/hive/hive/ci-base-image:ubuntu22.04-4

you can get it by specifying

ForzCross commented 2 days ago

I also got this problem on version 0.9.10, reverting to 0.9.9 fixed the deadline_timer hang.