Open jetomit opened 1 year ago
Hi @jetomit!
Using CLOCK_MONOTONIC
as you suggest seemed like the right choice to me so I started working on it. However, the API of (fibers timers)
as well as schedule-task-at-time
expect "internal time units"; changing timer-wheel
to use CLOCK_MONOTONIC
would affect those interfaces similarly, which is not acceptable.
Instead we should probably change timer-wheel-advance!
to cope with large gaps.
@wingo, WDYT?
Thanks!
@jetomit Here's a proposed workaround on the Guix side: https://issues.guix.gnu.org/64966
Here's a proposed workaround on the Guix side: https://issues.guix.gnu.org/64966
This would work for aarch64, but I also encounter this issue on armhf and x86_64 systems. This happens whenever system time is pushed forward by a significant amount (a day or more), either by ntpd or manually.
As I understand it, guile’s internal-time-units
only depends on the platform and is the same for all clock types. The bigger problem with using CLOCK_MONOTONIC
might be that it doesn’t count time the system is suspended, which would probably break stuff.
Another report of shepherd
spinning once system time has changed: https://issues.guix.gnu.org/66684
@wingo Hello! Did you have a chance to look into that? I'd be happy to try and implement any suggestions you might have (I'd love to do that before Shepherd 1.0 is out).
Took a look at it but it requires a bit of concentration to not introduce bugs :) Do have a look if you like!
Just posting for the records another example "in the wild" of someone working around this issue. https://issues.guix.gnu.org/70892#3
Thanks for all your hard work! :smile:
Apologies if anything is inaccurate in this comment, but I had a look through the timer code to see if I could think of anything. I still don't fully grok the control flow, but from a cursory read it does look like timer-wheel-advance!
iterates through all layers of the timer wheel even when "skipping ahead" across a large gap.
I wonder if a solution to this might be to skip iterating through the inner wheels if:
This feels like it should allow you to pretty rapidly "warp" through a large space of mostly unset timers (by climbing up the hierarchy and skipping slots at the lowest granularity possible), which is what you would expect to see if the clock has suddenly jumped years into the future.
Since Guix upgraded to guile-fibers 1.3.1, shepherd hangs shortly after boot on systems without a RTC. I believe the problem comes from using
get-internal-real-time
in the guile-fibers timer wheel implementation. After NTP corrects the system time, this function returns a much larger value, and the CPU load (for one core) goes to 100%.Profiling suggests the process spends the CPU time in
timer-wheel-advance!
, so I imagine it is trying to tick through a five-year time diff. I tried increasing the system time manually by N days, which causes shepherd to be unresponsive (e.g. toherd status
) for about N×5 seconds. I observed similar behavior with the example from guile-fibers readme.Replacing all instances of
(get-internal-real-time)
with(clock-gettime 1)
in guile-fibers, and reconfiguring the system with the patched package, fixes this problem. I think using a monotonic clock makes sense, but there is probably a cleaner / more portable way to do it.Thanks!