oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3.02k stars 185 forks source link

sleep seems too fast #2716

Closed eregon closed 2 years ago

eregon commented 2 years ago

Report by @ioquatix https://github.com/socketry/timers/blob/710aea96aa90b96a45df8e5fc0bab9dcbe447f6d/fixtures/timer_quantum.rb https://github.com/socketry/timers/runs/8149588088?check_suite_focus=true#step:4:8

I added some prints:

# Released under the MIT License.
# Copyright, 2022, by Samuel Williams.

class TimerQuantum
    def self.resolve
        self.new.to_f
    end

    def to_f
        precision
    end

    private

    def precision
        @precision ||= self.measure_host_precision
    end

    def measure_host_precision(repeats: 100, duration: 0.0001)
        # Measure the precision sleep using the monotonic clock:
        start_time = self.now
        repeats.times do
            sleep(duration)
        end
        end_time = self.now

        actual_duration = end_time - start_time
        expected_duration = repeats * duration
        p actual: actual_duration, expected: expected_duration

        if actual_duration < expected_duration
            warn "Invalid precision measurement: #{actual_duration} < #{expected_duration}"
            return 0.1
        end

        # This computes the overhead of sleep, called `repeats` times:
        return actual_duration - expected_duration
    end

    def now
        Process.clock_gettime(Process::CLOCK_MONOTONIC)
    end
end

TIMER_QUANTUM = TimerQuantum.resolve
p timer_quantum: TIMER_QUANTUM

Output:

$ ruby -v timer_quantum.rb
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]
{:actual=>0.015342843002144946, :expected=>0.01}
{:timer_quantum=>0.005342843002144946}

$ ruby -v timer_quantum.rb
truffleruby 22.3.0-dev-ea16721f, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
{:actual=>0.00025780400028452, :expected=>0.01}
Invalid precision measurement: 0.00025780400028452 < 0.01
{:timer_quantum=>0.1}
eregon commented 2 years ago

It's simple, we use java.lang.Thread.sleep(long millis) for Kernel#sleep, which uses milliseconds. And so we round 0.1ms to 0ms.

JRuby seems to currently have the same limitation: https://github.com/jruby/jruby/blob/f6ca739fbd4fc3fa101adb34ee319be6a78b232d/core/src/main/java/org/jruby/RubyKernel.java#L734

There is also java.lang.Thread.sleep(long millis, int nanos), I'll try that.

In general, I guess sleeps below 1ms are not portable across platforms and probably rarely accurate.

eregon commented 2 years ago

I have a fix. With it and repeats: 100, duration: 0.0001:

ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]
{:actual=>0.015312562998587964, :expected=>0.01}
{:timer_quantum=>0.005312562998587964}

truffleruby 22.3.0-dev-95f5f97b, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
{:actual=>0.10606739700233447, :expected=>0.01}
{:timer_quantum=>0.09606739700233448}

truffleruby 22.3.0-dev-95f5f97b, like ruby 3.0.3, Interpreted JVM [x86_64-linux]
{:actual=>0.11037447399939992, :expected=>0.01}
{:timer_quantum=>0.10037447399939993}

So there is quite a large overhead for these tiny sub-millisecond sleeps.

With repeats: 1000, duration: 0.0001:

ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]
{:actual=>0.15346887000123388, :expected=>0.1}
{:timer_quantum=>0.053468870001233876}

truffleruby 22.3.0-dev-95f5f97b, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
{:actual=>1.0592428449999716, :expected=>0.1}
{:timer_quantum=>0.9592428449999716}

truffleruby 22.3.0-dev-95f5f97b, like ruby 3.0.3, Interpreted JVM [x86_64-linux]
{:actual=>1.0684534419997362, :expected=>0.1}
{:timer_quantum=>0.9684534419997363}

With repeats: 1000, duration: 0.001 it's a lot more accurate for all:

ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]
{:actual=>1.0571165940018545, :expected=>1.0}
{:timer_quantum=>0.05711659400185454}

truffleruby 22.3.0-dev-95f5f97b, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
{:actual=>1.0571321360002912, :expected=>1.0}
{:timer_quantum=>0.05713213600029121}

truffleruby 22.3.0-dev-95f5f97b, like ruby 3.0.3, Interpreted JVM [x86_64-linux]
{:actual=>1.0679009020022932, :expected=>1.0}
{:timer_quantum=>0.06790090200229315}
eregon commented 2 years ago

I was wondering if the extra clock_gettime()s around the actual sleep are the cause of the overhead but it seems not. In fact, using only Thread.sleep(sec=0, ns=100000) in Kernel#sleep it's still:

truffleruby 22.3.0-dev-15c99303*, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
{:actual=>1.0564394450011605, :expected=>0.1}
{:timer_quantum=>0.9564394450011605}

And it's not the overhead of the call either, because if I replace the sleep with itself:

{:actual=>0.00097326499962946, :expected=>0.1}

So I guess it's a potential issue with java.lang.Thread.sleep(long millis, int nanos) in the JDK/what it uses behind the scenes.

In any case, sub-millisecond sleep like 0.0001 is problematic. Even on CRuby it's a 50% overhead. 1ms sleep is fine on all 3 implementations (CRuby, TruffleRuby Native, TruffleRuby JVM).

ioquatix commented 2 years ago

Thanks for investigating this. One could consider "sleep being too fast" to be an optimisation to make the program faster :)

This could be a problem for an event loop - sleeping not long enough can cause it to burn cycles calling sleep again. I'd advise that we should never sleep for less than what was requested.

ioquatix commented 2 years ago

Thanks for fixing this so quickly.