oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
2.98k stars 179 forks source link

Performance of the Kiba ETL benchmarks is low #1078

Open chrisseaton opened 6 years ago

chrisseaton commented 6 years ago

https://github.com/thbar/kiba/ and https://github.com/thbar/kiba-ruby-benchmarks

From https://github.com/oracle/truffleruby/issues/1054

bjfish commented 3 years ago

I ran the benchmark from the linked issue. It appears to be a longer running benchmark now and ran ~2.6x faster on native truffleruby:

bundle install
brew install axel
bundle exec kiba setup.etl
bundle exec kiba csv_processing.etl

truffleruby 20.2.0-dev like ruby 2.6.6, GraalVM CE Native [x86_64-darwin]

I, [2020-07-23T16:29:42.462000 #62571]  INFO -- : Running with truffleruby 20.2.0-dev-201c3006, like ruby 2.6.6, GraalVM CE Native [x86_64-darwin]
I, [2020-07-23T16:29:42.464000 #62571]  INFO -- : Opening data/extract-1000k.csv
I, [2020-07-23T16:34:43.571000 #62571]  INFO -- : Processing done (took 301.11 seconds) - 999901 rows processed

MRI 2.6.6

I, [2020-07-23T16:36:50.687523 #63185]  INFO -- : Running with ruby 2.6.6p146 
I, [2020-07-23T16:36:50.687628 #63185]  INFO -- : Opening data/extract-1000k.csv
I, [2020-07-23T16:49:58.552304 #63185]  INFO -- : Processing done (took 787.86 seconds) - 999901 rows processed
gogainda commented 3 years ago

run it with latest truffleruby and got the following results:

bundle exec kiba csv_processing.etl
I, [2021-03-05T20:49:34.039841 #46849]  INFO -- : Running with truffleruby 21.1.0-dev-ffeea561, like ruby 2.7.2, GraalVM CE Native [x86_64-darwin]
I, [2021-03-05T20:49:34.041098 #46849]  INFO -- : Opening data/extract-1000k.csv
I, [2021-03-05T20:52:48.918820 #46849]  INFO -- : Processing done (took 194.88 seconds) - 999901 rows processed
4068053997 51935714 data/output.csv
bundle exec kiba csv_processing.etl       
I, [2021-03-06T00:34:20.465825 #47890]  INFO -- : Running with ruby 2.7.1p83 
/Users/novoi/.rubies/ruby-2.7.1/lib/ruby/gems/2.7.0/gems/kiba-2.0.0/lib/kiba/runner.rb:68: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/Users/novoi/tmp/kiba-ruby-benchmarks/etl/csv_source.rb:25: warning: The called method `initialize' is defined here
I, [2021-03-06T00:34:20.465936 #47890]  INFO -- : Opening data/extract-1000k.csv
/Users/novoi/tmp/kiba-ruby-benchmarks/etl/csv_source.rb:14: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/Users/novoi/.rubies/ruby-2.7.1/lib/ruby/2.7.0/csv.rb:508: warning: The called method `foreach' is defined here
/Users/novoi/tmp/kiba-ruby-benchmarks/etl/csv_source.rb:32: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/Users/novoi/.rubies/ruby-2.7.1/lib/ruby/2.7.0/csv.rb:635: warning: The called method `open' is defined here
I, [2021-03-06T00:37:41.237080 #47890]  INFO -- : Processing done (took 200.77 seconds) - 999901 rows processed
4068053997 51935714 data/output.csv
eregon commented 3 years ago

Thanks for the update, that looks pretty close. It would still be worth investigating how to get it faster on TruffleRuby.

(the deleted comment above was a test comment)