wolfcw / libfaketime

libfaketime modifies the system time for a single application
https://github.com/wolfcw/libfaketime
GNU General Public License v2.0
2.64k stars 321 forks source link

Python Threading Ticker needs FAKETIME_DONT_FAKE_MONOTONIC #390

Closed maurerle closed 2 years ago

maurerle commented 2 years ago

I had the Problem that my background jobs did not execute in a docker container with mocked time through the great libfaketime library.

My MWE (minimum working example) to reproduce this issue looks like this:

import threading

import logging
from datetime import datetime
logging.basicConfig()
logging.getLogger().setLevel('INFO')
def print_something(something):
    ticker = threading.Event()
    while not ticker.wait(2):
        logging.info(f"{something} {datetime.now()}")

#Created the Threads
logging.info("starting")
t = threading.Thread(target=print_something, daemon=True, args=('hello world',))
t.start()
t.join()

This does not print anything. Yet, if the environment variable FAKETIME_DONT_FAKE_MONOTONIC=1 is set, it works.

Exemplary Dockerfile:

FROM debian:stable-slim as builder
RUN apt-get update                                                       \
    && apt-get install --no-install-recommends -y git build-essential ca-certificates\
    && git clone https://github.com/wolfcw/libfaketime.git /libfaketime  \
    && cd /libfaketime                                                   \
    && make                                                              \
    && make install

FROM python:3.10-slim
COPY --from=builder /usr/local/lib/faketime/libfaketimeMT.so.1 /faketimeMT.so

ENV LD_PRELOAD=/faketimeMT.so
ENV FAKETIME="@2021-01-01 00:00:00"
ENV FAKETIME_DONT_RESET=1
ENV FAKETIME_DONT_FAKE_MONOTONIC=1 # <- without this it does not work

COPY test.py test.py
CMD ["python", "test.py"]

This was an issue I struggled with, so I wanted to document it so that others can find the solution. Maybe its worth noting that FAKETIME_DONT_FAKE_MONOTONIC is not only needed for Java

wolfcw commented 2 years ago

Thanks for reporting and posting a workaround!

Since you clone the repository and compile, have you checked whether "make test" also hangs in the container? If so, compiling with FORCE_MONOTONIC_FIX enabled in the Makefile might help.

amurzeau commented 2 years ago

I also get this with just:

from threading import Event

event = Event()
print("Waiting")
event.wait(1.0)
print("Done")

I get this, hanging after "waiting":

doc@debian64:~/test$ cat test_sem_clockwait.py
from threading import Event, Lock

event = Event()
print("Waiting")
event.wait(1.0)
print("Done")

doc@debian64:~/test$ time python3 test_sem_clockwait.py
Waiting
Done

real    0m1,042s
user    0m0,030s
sys     0m0,011s
doc@debian64:~/test$ faketime -f '@2021-01-01 00:00:00' python3 test_sem_clockwait.py
Waiting

gdb stacktrace:

(gdb) bt
__futex_abstimed_wait_common64 (futex_word=futex_word@entry=0x55e1d61fbb20, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7fffd671d8b0, private=<optimized out>, cancel=cancel@entry=true) at ../sysdeps/nptl/futex-internal.c:87
87      ../sysdeps/nptl/futex-internal.c: Aucun fichier ou dossier de ce type.
(gdb) bt
#0  __futex_abstimed_wait_common64 (futex_word=futex_word@entry=0x555555b3b370, expected=expected@entry=0, clockid=clockid@entry=1,
    abstime=abstime@entry=0x7fffffffdfb0, private=<optimized out>, cancel=cancel@entry=true) at ../sysdeps/nptl/futex-internal.c:87
#1  0x00007ffff7dfa15b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555555b3b370, expected=expected@entry=0, clockid=clockid@entry=1,
    abstime=abstime@entry=0x7fffffffdfb0, private=<optimized out>) at ../sysdeps/nptl/futex-internal.c:123
#2  0x00007ffff7df6740 in do_futex_wait (sem=sem@entry=0x555555b3b370, clockid=clockid@entry=1, abstime=abstime@entry=0x7fffffffdfb0) at sem_waitcommon.c:112
#3  0x00007ffff7df67ee in __new_sem_wait_slow64 (sem=0x555555b3b370, clockid=1, abstime=0x7fffffffdfb0) at sem_waitcommon.c:184
#4  0x00007ffff7df685d in __sem_clockwait (sem=sem@entry=0x555555b3b370, clockid=clockid@entry=1, abstime=abstime@entry=0x7fffffffdfb0) at sem_clockwait.c:46
#5  0x000055555569f187 in PyThread_acquire_lock_timed (lock=0x555555b3b370, microseconds=1000000, intr_flag=1) at ../Python/thread_pthread.h:488

See https://sources.debian.org/src/python3.10/3.10.4-3/Python/thread_pthread.h/#L488

I'm using libfaketime v0.9.10 and python 3.10.4 on Debian sid.

Python 3.10.4 uses sem_clockwait which seems not implemented in libfaketime, I don't find it in https://github.com/wolfcw/libfaketime/blob/v0.9.10/src/libfaketime.c.

Using FAKETIME_DONT_FAKE_MONOTONIC=1 faketime -f '@2021-01-01 00:00:00' python3 test_sem_clockwait.py works (no hang).

amurzeau commented 2 years ago

I also get the hang with faketime +0minutes python3 test_sem_clockwait.py (so without time offset). When compiling a custom program to print clock_gettime, I see that the monotonic and real time became the same with faketime:

doc@debian64:~/debian$ cat faketimetest.c
#include <semaphore.h>
#include <time.h>
#include <unistd.h>
#include <stdio.h>

int main() {
    struct timespec ts;
        clock_gettime(CLOCK_REALTIME, &ts);
    printf("gettime CLOCK_REALTIME: %d.%09d\n", ts.tv_sec, ts.tv_nsec);

        clock_gettime(CLOCK_MONOTONIC, &ts);
    printf("gettime CLOCK_MONOTONIC: %d.%09d\n", ts.tv_sec, ts.tv_nsec);
}
doc@debian64:~/debian$ gcc -o faketimetest faketimetest.c 
doc@debian64:~/debian$ ./faketimetest 
gettime CLOCK_REALTIME: 1652131778.217399576
gettime CLOCK_MONOTONIC: 31483.274837271
doc@debian64:~/debian$ faketime +0minutes ./faketimetest 
gettime CLOCK_REALTIME: 1652131783.856929605
gettime CLOCK_MONOTONIC: 1652131783.856986596
doc@debian64:~/debian$ FAKETIME_FORCE_MONOTONIC_FIX=1 faketime +0minutes ./faketimetest 
gettime CLOCK_REALTIME: 1652131924.381097987
gettime CLOCK_MONOTONIC: 1652131924.381167182

With sem_clockwait with CLOCK_MONOTONIC, the glibc is calling the kernel directly with a monotonic clock, and as the kernel is expected something in the lines of 31483.274837271 (a small number of seconds), it "hangs".

Python 3.10 does this:

wolfcw commented 2 years ago

Thanks for your detailed analysis, all plausible. Let's hope it's related to the yet missing sem_clockwait() support. I can't reproduce with Python 3.10.4 on Linux, but likely just my glibc is too old on that system.

f706373 experimentally adds support for sem_clockwait() with no testing procedure yet, but eventually you could give it a shot.

amurzeau commented 2 years ago

I will try the commit now.

amurzeau commented 2 years ago

I've tested commit f706373bc2cc63b34f662fa6d89a0e07f5613482 with FAKETIME="+200d" LD_PRELOAD=$PWD/lib/faketime/libfaketime.so.1 python3.10 test_sem_clockwait.py

And it doesn't hang, the delay is the expected one. When I'm using x2 or x0.5 I get the reversed behavior, where x2 slows down the real delay:

doc@debian64:faketime$ time env FAKETIME="+200d x2" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done

real    0m2,127s
user    0m0,018s
sys     0m0,022s
doc@debian64:faketime$ time env FAKETIME="+200d x0.5" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done

real    0m0,527s
user    0m0,048s
sys     0m0,000s

Compared to sleep:

doc@debian64:faketime$ time FAKETIME="+200d x2" LD_PRELOAD=$PWD/libfaketime.so.1 sleep 1

real    0m0,505s
user    0m0,000s
sys     0m0,004s
doc@debian64:faketime$ time FAKETIME="+200d x0.5" LD_PRELOAD=$PWD/libfaketime.so.1 sleep 1

real    0m2,004s
user    0m0,001s
sys     0m0,003s

So it doesn't hang anymore, but the clock speed feature is not totally ok.

wolfcw commented 2 years ago

Thanks for testing and pointing this out! This latest commit should address the reversed factor.

amurzeau commented 2 years ago

Good, I've not found any issues now with 859751e2cb58175ca51895f91fdd5f7f9e74bc76:

doc@debian64:faketime$ time env FAKETIME="+200d" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done

real    0m1,065s
user    0m0,035s
sys     0m0,013s
doc@debian64:faketime$ time env FAKETIME="+200d x2" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done

real    0m0,563s
user    0m0,041s
sys     0m0,000s
doc@debian64:faketime$ time env FAKETIME="+200d x0.5" LD_PRELOAD=$PWD/libfaketime.so.1 python3.10 ~/test/test_sem_clockwait.py
Waiting
Done

real    0m2,058s
user    0m0,023s
sys     0m0,019s
maurerle commented 2 years ago

Hi, currently my initial test case does not work. This broke my build, as currently there isn't any waiting time at all between the calls

leaving away FAKETIME_DONT_FAKE_MONOTONIC resolves this. Otherwise nothing is outputted or everything is outputted very fast without watining.

maurerle commented 2 years ago

Somehow this does also not work if

FAKETIME= is set but empty

wolfcw commented 2 years ago

The latest commit should fix that the intercepted sem_clockwait() also honors when "don't fake monotonic" is set.

If the issue persists for you, does it happen no matter whether FAKETIME_DONT_FAKE_MONOTONIC is set or not? Does it happen with both negative and positive faketime offsets? We'd then probably need to strace a bit to figure out whether libfaketime uses wrong values or something goes wrong when the real sem_clockwait() is called.

maurerle commented 2 years ago

Great. That works great now.

Empty faketime works too. Increasing the multiplier works too. Thank you very much for the fast fixes!

I will close this issue as nothing else is left to do.