Closed thbar closed 4 years ago
Thanks for the report!
We've got some docs on reporting performance problems https://github.com/oracle/truffleruby/blob/master/doc/user/reporting-performance-problems.md.
FYI, I've begun looking into this issue and have submitted a few fixes already. Unfortunately, they didn't land in time for the next release (1.0.0 RC2), so you'll either need to build from master or wait for 1.0.0 RC3. But, I wanted you to know I haven't forgotten about your report.
@nirvdrum thanks for the update! FYI, I've re-discussed that with @eregon at RubyKaigi.
In all cases no hurry on anything, I'm just trying out things, no immediate real production need :-)
I created a small standalone script to measure CSV parsing with CSV.foreach
for a 10k-rows CSV file, as used in the latest kiba-ruby-benchmarks:
https://gist.github.com/eregon/577d2716c517aa2ec3018aa359073115
It gets faster after warmup, but it's still slower than MRI currently.
@eregon Checked latest dev build vs ruby 2.7. After couple of rounds of warmup, it finished with comparable to MRI performance.
truffleruby 20.2.0-dev-91e7ba04, like ruby 2.6.5, GraalVM CE Native [x86_64-linux]
24.871960829999807
28.58095245200002
23.962383978000616
9.586286597999788
6.702010275999783
6.586533140000029
6.447069820999786
6.553509518999817
6.646038615999714
6.462447326999609
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
5.9263605000005555
5.8942821059999915
5.90000361500006
5.886332756999764
5.880790037000224
5.912058664999677
5.898451475000002
5.8683086130004085
5.882981536999978
5.879075885999555
Code I used:
require 'csv'
require 'benchmark'
dir = "data"
Dir.mkdir(dir) unless Dir.exist?(dir)
input = "#{dir}/extract.csv"
unless File.exist?(input)
Dir.chdir(dir) do
zip = "sirene_2017111_E_Q.zip"
system "wget", "-O", zip, "http://files.data.gouv.fr/sirene/#{zip}"
system "unzip", zip
system "head -10000 sirc* > extract.csv"
end
end
10.times do
p Benchmark.realtime {
CSV.foreach(input, col_sep: ';', encoding: "ISO-8859-1:UTF-8") { |row|
}
}
end
Thanks, I think this is close enough to MRI to not be "slower" anymore (at least not 2x). Maybe it can be further optimized but the performance of that benchmark for CSV seems reasonable now, so I'd like to close this issue.
@gogainda thanks for checking!
@eregon agreed, this looks perfectly acceptable for now. I will provide other ETL-based benchmarks in the future ;-)
I made a test to compare speed of reading a CSV file with MRI 2.5.0 and TruffleRuby 0.30.2. It can be found here.
Copying it to make sure the reference doesn't break later:
I'm only counting the time to read, not the time to write.
I took care not to use exotic encodings (e.g. ISO-8859-1) for now.
Here is the outcome:
Measure csv-foreach - 2.69 seconds
Measure csv-foreach - 5.36 seconds