Closed maurerle closed 2 years ago
Thanks for reporting and posting a workaround!
Since you clone the repository and compile, have you checked whether "make test" also hangs in the container? If so, compiling with FORCE_MONOTONIC_FIX enabled in the Makefile might help.
I also get this with just:
from threading import Event
event = Event()
print("Waiting")
event.wait(1.0)
print("Done")
I get this, hanging after "waiting":
doc@debian64:~/test$ cat test_sem_clockwait.py
from threading import Event, Lock
event = Event()
print("Waiting")
event.wait(1.0)
print("Done")
doc@debian64:~/test$ time python3 test_sem_clockwait.py
Waiting
Done
real 0m1,042s
user 0m0,030s
sys 0m0,011s
doc@debian64:~/test$ faketime -f '@2021-01-01 00:00:00' python3 test_sem_clockwait.py
Waiting
gdb stacktrace:
(gdb) bt
__futex_abstimed_wait_common64 (futex_word=futex_word@entry=0x55e1d61fbb20, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7fffd671d8b0, private=<optimized out>, cancel=cancel@entry=true) at ../sysdeps/nptl/futex-internal.c:87
87 ../sysdeps/nptl/futex-internal.c: Aucun fichier ou dossier de ce type.
(gdb) bt
#0 __futex_abstimed_wait_common64 (futex_word=futex_word@entry=0x555555b3b370, expected=expected@entry=0, clockid=clockid@entry=1,
abstime=abstime@entry=0x7fffffffdfb0, private=<optimized out>, cancel=cancel@entry=true) at ../sysdeps/nptl/futex-internal.c:87
#1 0x00007ffff7dfa15b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555555b3b370, expected=expected@entry=0, clockid=clockid@entry=1,
abstime=abstime@entry=0x7fffffffdfb0, private=<optimized out>) at ../sysdeps/nptl/futex-internal.c:123
#2 0x00007ffff7df6740 in do_futex_wait (sem=sem@entry=0x555555b3b370, clockid=clockid@entry=1, abstime=abstime@entry=0x7fffffffdfb0) at sem_waitcommon.c:112
#3 0x00007ffff7df67ee in __new_sem_wait_slow64 (sem=0x555555b3b370, clockid=1, abstime=0x7fffffffdfb0) at sem_waitcommon.c:184
#4 0x00007ffff7df685d in __sem_clockwait (sem=sem@entry=0x555555b3b370, clockid=clockid@entry=1, abstime=abstime@entry=0x7fffffffdfb0) at sem_clockwait.c:46
#5 0x000055555569f187 in PyThread_acquire_lock_timed (lock=0x555555b3b370, microseconds=1000000, intr_flag=1) at ../Python/thread_pthread.h:488
See https://sources.debian.org/src/python3.10/3.10.4-3/Python/thread_pthread.h/#L488
I'm using libfaketime v0.9.10 and python 3.10.4 on Debian sid.
Python 3.10.4 uses sem_clockwait which seems not implemented in libfaketime, I don't find it in https://github.com/wolfcw/libfaketime/blob/v0.9.10/src/libfaketime.c.
Using FAKETIME_DONT_FAKE_MONOTONIC=1 faketime -f '@2021-01-01 00:00:00' python3 test_sem_clockwait.py
works (no hang).
I also get the hang with faketime +0minutes python3 test_sem_clockwait.py
(so without time offset).
When compiling a custom program to print clock_gettime, I see that the monotonic and real time became the same with faketime:
doc@debian64:~/debian$ cat faketimetest.c
#include <semaphore.h>
#include <time.h>
#include <unistd.h>
#include <stdio.h>
int main() {
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
printf("gettime CLOCK_REALTIME: %d.%09d\n", ts.tv_sec, ts.tv_nsec);
clock_gettime(CLOCK_MONOTONIC, &ts);
printf("gettime CLOCK_MONOTONIC: %d.%09d\n", ts.tv_sec, ts.tv_nsec);
}
doc@debian64:~/debian$ gcc -o faketimetest faketimetest.c
doc@debian64:~/debian$ ./faketimetest
gettime CLOCK_REALTIME: 1652131778.217399576
gettime CLOCK_MONOTONIC: 31483.274837271
doc@debian64:~/debian$ faketime +0minutes ./faketimetest
gettime CLOCK_REALTIME: 1652131783.856929605
gettime CLOCK_MONOTONIC: 1652131783.856986596
doc@debian64:~/debian$ FAKETIME_FORCE_MONOTONIC_FIX=1 faketime +0minutes ./faketimetest
gettime CLOCK_REALTIME: 1652131924.381097987
gettime CLOCK_MONOTONIC: 1652131924.381167182
With sem_clockwait with CLOCK_MONOTONIC, the glibc is calling the kernel directly with a monotonic clock, and as the kernel is expected something in the lines of 31483.274837271
(a small number of seconds), it "hangs".
Python 3.10 does this:
clock_gettime(CLOCK_MONOTONIC, &ts)
Thanks for your detailed analysis, all plausible. Let's hope it's related to the yet missing sem_clockwait()
support. I can't reproduce with Python 3.10.4 on Linux, but likely just my glibc is too old on that system.
f706373 experimentally adds support for sem_clockwait()
with no testing procedure yet, but eventually you could give it a shot.
I will try the commit now.
I've tested commit f706373bc2cc63b34f662fa6d89a0e07f5613482 with FAKETIME="+200d" LD_PRELOAD=$PWD/lib/faketime/libfaketime.so.1 python3.10 test_sem_clockwait.py
And it doesn't hang, the delay is the expected one.
When I'm using x2
or x0.5
I get the reversed behavior, where x2
slows down the real delay:
doc@debian64:faketime$ time env FAKETIME="+200d x2" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done
real 0m2,127s
user 0m0,018s
sys 0m0,022s
doc@debian64:faketime$ time env FAKETIME="+200d x0.5" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done
real 0m0,527s
user 0m0,048s
sys 0m0,000s
Compared to sleep:
doc@debian64:faketime$ time FAKETIME="+200d x2" LD_PRELOAD=$PWD/libfaketime.so.1 sleep 1
real 0m0,505s
user 0m0,000s
sys 0m0,004s
doc@debian64:faketime$ time FAKETIME="+200d x0.5" LD_PRELOAD=$PWD/libfaketime.so.1 sleep 1
real 0m2,004s
user 0m0,001s
sys 0m0,003s
So it doesn't hang anymore, but the clock speed feature is not totally ok.
Thanks for testing and pointing this out! This latest commit should address the reversed factor.
Good, I've not found any issues now with 859751e2cb58175ca51895f91fdd5f7f9e74bc76:
doc@debian64:faketime$ time env FAKETIME="+200d" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done
real 0m1,065s
user 0m0,035s
sys 0m0,013s
doc@debian64:faketime$ time env FAKETIME="+200d x2" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done
real 0m0,563s
user 0m0,041s
sys 0m0,000s
doc@debian64:faketime$ time env FAKETIME="+200d x0.5" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done
real 0m2,058s
user 0m0,023s
sys 0m0,019s
Hi, currently my initial test case does not work. This broke my build, as currently there isn't any waiting time at all between the calls
leaving away FAKETIME_DONT_FAKE_MONOTONIC resolves this. Otherwise nothing is outputted or everything is outputted very fast without watining.
Somehow this does also not work if
FAKETIME=
is set but empty
The latest commit should fix that the intercepted sem_clockwait()
also honors when "don't fake monotonic" is set.
If the issue persists for you, does it happen no matter whether FAKETIME_DONT_FAKE_MONOTONIC
is set or not? Does it happen with both negative and positive faketime offsets? We'd then probably need to strace a bit to figure out whether libfaketime uses wrong values or something goes wrong when the real sem_clockwait()
is called.
Great. That works great now.
Empty faketime works too. Increasing the multiplier works too. Thank you very much for the fast fixes!
I will close this issue as nothing else is left to do.
I had the Problem that my background jobs did not execute in a docker container with mocked time through the great libfaketime library.
My MWE (minimum working example) to reproduce this issue looks like this:
This does not print anything. Yet, if the environment variable
FAKETIME_DONT_FAKE_MONOTONIC=1
is set, it works.Exemplary Dockerfile:
This was an issue I struggled with, so I wanted to document it so that others can find the solution. Maybe its worth noting that
FAKETIME_DONT_FAKE_MONOTONIC
is not only needed for Java