oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3.02k stars 185 forks source link

TruffleRuby slower than MRI in toy benchmark #1707

Closed ivoanjo closed 2 years ago

ivoanjo commented 5 years ago

Hello there!

I was just playing with a toy benchmark written by a friend, and was surprised that TruffleRuby was noticeably slower than MRI or JRuby. It's definitely not a priority, but I thought I'd report it anyway, maybe I'll learn something new.

Here's my quick results from my Core i7-8550U laptop:

I'm running it just with time ruby bench.rb output3.txt >> /dev/null

Here's the toy benchmark:

#!/usr/bin/env ruby

$stderr.puts RUBY_DESCRIPTION

def str_to_i(str)
  num = str.to_i
  num.to_s == str ? num : nil
end

def is_valid(line)
  segs = line.split(' ', 3)
  if segs.length > 1
    num = str_to_i(segs[1])
    return true if num != nil and num > 10
  end
  return false
end

ARGF.each do |line|
  puts line if is_valid(line)
end

And I've attached the file I was using it. It's small compressed, but it decompresses to around 2.5GB and it's just two lines from ps aux repeated over and over again:

output3.txt.zip

gogainda commented 2 years ago

latest results:

bench % time ruby bench.rb output3.txt >> /dev/null
truffleruby 22.1.0-dev-8553bb8b, like ruby 3.0.2, GraalVM CE Native [x86_64-darwin]
ruby bench.rb output3.txt >> /dev/null  35.00s user 6.91s system 115% cpu 36.244 total

bench % time ruby bench.rb output3.txt >> /dev/null
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-darwin19]
ruby bench.rb output3.txt >> /dev/null  40.27s user 1.00s system 99% cpu 41.426 total
eregon commented 2 years ago

For me locally TruffleRuby is about 2x faster:

$ time ruby iobench.rb ~/Downloads/output3.txt > /dev/null
truffleruby 22.1.0-dev-8553bb8b, like ruby 3.0.2, GraalVM CE Native [x86_64-linux]
28.44s user 3.55s system 125% cpu 25.564 total

$ time ruby iobench.rb ~/Downloads/output3.txt > /dev/null 
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]
48.13s user 0.74s system 99% cpu 49.052 total

I also redirected to /dev/null as otherwise it's likely gonna be mostly the terminal slowing down the benchmark

And this is a profile of the benchmark with --cpusampler. Very heavy on IO, so 2x is pretty good.

--------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 2453 samples with period 10ms. Missed 0 samples.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
--------------------------------------------------------------------------------------------------------------------------
Thread[main,5,main]
 Name                                   ||             Total Time    ||              Self Time    || Location
--------------------------------------------------------------------------------------------------------------------------
 Truffle::POSIX.write                   ||            13790ms  56.2% ||            13740ms  56.0% || resource:/truffleruby/core/posix.rb~129:3635-3691
 Truffle::Splitter.split                ||             3730ms  15.2% ||             3730ms  15.2% || resource:/truffleruby/core/splitter.rb~42-90:2000-3627
 FFI::Pointer#put_bytes                 ||             3380ms  13.8% ||             3380ms  13.8% || resource:/truffleruby/core/truffle/ffi/pointer.rb~185-200:5774-6338
 Object#str_to_i                        ||             1080ms   4.4% ||             1060ms   4.3% || iobench.rb~5-8:52-119
 POSIX.write_string_native              ||            18190ms  74.2% ||              940ms   3.8% || resource:/truffleruby/core/posix.rb~483-516:16522-17518
 POSIX.read_to_buffer_native            ||              570ms   2.3% ||              560ms   2.3% || resource:/truffleruby/core/posix.rb~422-446:14970-15607

450c3d92168ad1bc7d7be73b035a63961bbbf5c3 might help a bit, although that's about IO reading, not writing.