Open pelletier opened 11 years ago
I don't see an obvious reason for this, can you see if you can reproduce it by just calling read(8192)
over and over again to read the full file?
Something like that?
$ cat /tmp/test2.rb
f = File.open("/usr/share/dict/words", "r")
while f.read(8192) do
end
f.close()
$ time ruby /tmp/test2.rb
ruby /tmp/test2.rb 0.02s user 0.01s system 91% cpu 0.034 total
$ time topaz /tmp/test2.rb
topaz /tmp/test2.rb 0.01s user 0.01s system 84% cpu 0.025 total
So it's a bug in readlines()
itself, or something else readlines()
calls, not in the actual reading of data.
readlines()
delegates the IO work to each_line()
, and something wrong is probably happening there:
$ cat /tmp/test3.rb
f = File.open("/usr/share/dict/words", "r")
f.each_line { |line| }
f.close()
$ time ruby /tmp/test3.rb
ruby /tmp/test3.rb 0.09s user 0.01s system 96% cpu 0.105 total
$ time topaz /tmp/test3.rb
topaz /tmp/test3.rb 0.81s user 0.03s system 98% cpu 0.852 total
Oh, another thing occurred to me, this could just be the JIT kicking in and having slow warm up time, can you put ths in a loop and try doing it 10 times (or something) and see if it changes?
I ran it in a loop ten times, we're still super slow, it's something inefficent in that algorithm.
80% of the time is spent in String#split
, that's totally bonkers so that's a good place to start fixing.
I think the right startup point on this is to replace the split logic here with a buffer and indexes into it.
OK I will try that when I have some spare time.
I don't know what changed exactly since my last comment, but it seems to be a bit better now:
time ./bin/topaz test.rb
./bin/topaz test.rb 0.33s user 0.02s system 98% cpu 0.364 total
Hopefully addressed in #501
I just tested with your PR: things are actually worse:
~/code/topaz > cat test.rb
f = File.open("words", "r")
f.each_line { |line| }
f.close()
~/code/topaz > cat words|wc -l
5000
~/code/topaz > git checkout pr/501
Switched to branch 'pr/501'
~/code/topaz > time python -m topaz test.rb
python -m topaz test.rb 91.35s user 0.86s system 82% cpu 1:52.30 total
~/code/topaz > git checkout master
Switched to branch 'master'
~/code/topaz > time python -m topaz test.rb
python -m topaz test.rb 7.00s user 0.42s system 89% cpu 8.315 total
~/code/topaz > time ruby test.rb
ruby test.rb 0.02s user 0.01s system 90% cpu 0.029 total
Ha! Well, I can't take much credit for the buffering implementation, but I'd like to think improving it over in rpython is preferable to tuning it only for Topaz. Still waiting on feedback from @alex on #501 :-) On Mar 29, 2013 11:01 AM, "Thomas Pelletier" notifications@github.com wrote:
I just tested with your PR: things are actually worse:
~/code/topaz > cat test.rb f = File.open("words", "r") f.each_line { |line| } f.close() ~/code/topaz > cat words|wc -l 5000
~/code/topaz > git checkout pr/501 Switched to branch 'pr/501' ~/code/topaz > time python -m topaz test.rb python -m topaz test.rb 91.35s user 0.86s system 82% cpu 1:52.30 total
~/code/topaz > git checkout master Switched to branch 'master' ~/code/topaz > time python -m topaz test.rb python -m topaz test.rb 7.00s user 0.42s system 89% cpu 8.315 total
~/code/topaz > time ruby test.rb ruby test.rb 0.02s user 0.01s system 90% cpu 0.029 total
— Reply to this email directly or view it on GitHubhttps://github.com/topazproject/topaz/issues/336#issuecomment-15644514 .
Topaz was compiled with
-Ojit
on f5d88a0b7e0fe3e9293a9de90212651717321e80.