Closed houshuang closed 5 years ago
There have been some issues with MacRuby before, so I find it encouraging that the parsing seems to work, albeit at this slow speed.
I've been using two scripts for benchmarking and profiling that I'll add to the repository in a minute.
Meanwhile, can you check whether or not the following gems are compatible with you version of MacRuby:
Also, do you have the dot
executable installed? (check via which dot
) If you're using macports or homebrew you should be able to get it by installing the graphviz package.
Alright, the benchmark and profiling are already in the repository.
Unfortunately, ruby-prof
does not seem to be available for MacRuby yet: this will make it more difficult to find out what the problem is, but we'll try to make do with the benchmark script.
FIrst, see whether or not you can get benchmark.rb to run on your system. You may have to install gnuplot
(homebrew or macports and the ruby gem), or, alternatively, comment out everything after the line require 'gnuplot'
. Basically, what the script does, is to execute and benchmark two functions f and g. The idea is that f works on an individual BibTeX entry repeatedly, while g works on a larger bibliography. We would expect the performance of f and g to be extremely similar (because it should not make a difference whether you parse a thousand entries consecutively or a bibliography containing a thousand entries). Furthermore, both f and g should be linear: to see that it's convenient to look at the gnuplot graphs.
Anyway, there are two options n and k currently set to 1001 and 20. This means that the script will execute with 20 entries, then 40, then 60 and so on up to 1001. This typically gives you a nice plot. Since your test with 67 entries already took half a minute, you will probably want to use a smaller number for n first (for example, start with 100). It will still take a few minutes, but do try to run it and let me know the results.
Here is the output when I run the benchmark with MRI 1.9.2 on my system:
$ ruby test/benchmark.rb
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.001007)
g(1): 0.000000 0.000000 0.000000 ( 0.000678)
f(21): 0.020000 0.000000 0.020000 ( 0.020126)
g(21): 0.020000 0.010000 0.030000 ( 0.017425)
f(41): 0.040000 0.000000 0.040000 ( 0.036070)
g(41): 0.030000 0.000000 0.030000 ( 0.034123)
f(61): 0.050000 0.000000 0.050000 ( 0.055657)
g(61): 0.050000 0.000000 0.050000 ( 0.047118)
f(81): 0.060000 0.000000 0.060000 ( 0.057583)
g(81): 0.050000 0.000000 0.050000 ( 0.055533)
f(101): 0.070000 0.000000 0.070000 ( 0.068648)
g(101): 0.070000 0.000000 0.070000 ( 0.074981)
f(121): 0.080000 0.000000 0.080000 ( 0.081537)
g(121): 0.090000 0.000000 0.090000 ( 0.087914)
f(141): 0.100000 0.000000 0.100000 ( 0.105223)
g(141): 0.100000 0.000000 0.100000 ( 0.100114)
f(161): 0.110000 0.000000 0.110000 ( 0.111240)
g(161): 0.110000 0.000000 0.110000 ( 0.111883)
f(181): 0.140000 0.010000 0.150000 ( 0.134396)
g(181): 0.130000 0.000000 0.130000 ( 0.138483)
f(201): 0.140000 0.000000 0.140000 ( 0.137307)
g(201): 0.160000 0.000000 0.160000 ( 0.158308)
f(221): 0.150000 0.000000 0.150000 ( 0.156125)
g(221): 0.180000 0.000000 0.180000 ( 0.175792)
f(241): 0.170000 0.000000 0.170000 ( 0.172716)
g(241): 0.190000 0.000000 0.190000 ( 0.200111)
f(261): 0.170000 0.000000 0.170000 ( 0.180065)
g(261): 0.220000 0.010000 0.230000 ( 0.222491)
f(281): 0.190000 0.000000 0.190000 ( 0.191242)
g(281): 0.250000 0.000000 0.250000 ( 0.244426)
f(301): 0.200000 0.000000 0.200000 ( 0.207241)
g(301): 0.270000 0.000000 0.270000 ( 0.270698)
f(321): 0.220000 0.000000 0.220000 ( 0.223200)
g(321): 0.290000 0.010000 0.300000 ( 0.290713)
f(341): 0.230000 0.000000 0.230000 ( 0.238174)
g(341): 0.320000 0.000000 0.320000 ( 0.324253)
f(361): 0.250000 0.000000 0.250000 ( 0.249256)
g(361): 0.320000 0.010000 0.330000 ( 0.326222)
f(381): 0.250000 0.000000 0.250000 ( 0.275628)
g(381): 0.340000 0.000000 0.340000 ( 0.366691)
f(401): 0.270000 0.000000 0.270000 ( 0.280956)
g(401): 0.350000 0.010000 0.360000 ( 0.358961)
f(421): 0.290000 0.000000 0.290000 ( 0.294395)
g(421): 0.370000 0.000000 0.370000 ( 0.382586)
f(441): 0.310000 0.000000 0.310000 ( 0.310237)
g(441): 0.410000 0.010000 0.420000 ( 0.468336)
f(461): 0.320000 0.000000 0.320000 ( 0.352464)
g(461): 0.440000 0.000000 0.440000 ( 0.459599)
f(481): 0.320000 0.000000 0.320000 ( 0.324716)
g(481): 0.470000 0.010000 0.480000 ( 0.480357)
f(501): 0.340000 0.000000 0.340000 ( 0.366143)
g(501): 0.500000 0.010000 0.510000 ( 0.521129)
f(521): 0.360000 0.000000 0.360000 ( 0.382934)
g(521): 0.540000 0.010000 0.550000 ( 0.585427)
f(541): 0.380000 0.000000 0.380000 ( 0.409161)
g(541): 0.560000 0.010000 0.570000 ( 0.569871)
f(561): 0.380000 0.000000 0.380000 ( 0.392196)
g(561): 0.610000 0.000000 0.610000 ( 0.647884)
f(581): 0.390000 0.010000 0.400000 ( 0.401241)
g(581): 0.560000 0.000000 0.560000 ( 0.566974)
f(601): 0.420000 0.010000 0.430000 ( 0.455462)
g(601): 0.560000 0.010000 0.570000 ( 0.574165)
f(621): 0.410000 0.000000 0.410000 ( 0.419186)
g(621): 0.590000 0.000000 0.590000 ( 0.608821)
f(641): 0.420000 0.010000 0.430000 ( 0.426153)
g(641): 0.630000 0.000000 0.630000 ( 0.635338)
f(661): 0.440000 0.010000 0.450000 ( 0.449898)
g(661): 0.650000 0.000000 0.650000 ( 0.654837)
f(681): 0.440000 0.000000 0.440000 ( 0.446963)
g(681): 0.680000 0.010000 0.690000 ( 0.688274)
f(701): 0.470000 0.000000 0.470000 ( 0.474683)
g(701): 0.710000 0.010000 0.720000 ( 0.719676)
f(721): 0.480000 0.000000 0.480000 ( 0.484895)
g(721): 0.740000 0.000000 0.740000 ( 0.748554)
f(741): 0.490000 0.010000 0.500000 ( 0.491958)
g(741): 0.790000 0.000000 0.790000 ( 0.790580)
f(761): 0.480000 0.010000 0.490000 ( 0.488490)
g(761): 0.820000 0.000000 0.820000 ( 0.826844)
f(781): 0.510000 0.010000 0.520000 ( 0.513018)
g(781): 0.850000 0.000000 0.850000 ( 0.859994)
f(801): 0.520000 0.010000 0.530000 ( 0.534045)
g(801): 0.890000 0.000000 0.890000 ( 0.899764)
f(821): 0.550000 0.010000 0.560000 ( 0.575607)
g(821): 0.990000 0.010000 1.000000 ( 1.046289)
f(841): 0.560000 0.010000 0.570000 ( 0.588986)
g(841): 0.970000 0.010000 0.980000 ( 0.988280)
f(861): 0.580000 0.000000 0.580000 ( 0.585552)
g(861): 1.010000 0.010000 1.020000 ( 1.051506)
f(881): 0.580000 0.000000 0.580000 ( 0.590940)
g(881): 1.070000 0.010000 1.080000 ( 1.084322)
f(901): 0.610000 0.010000 0.620000 ( 0.644608)
g(901): 1.100000 0.010000 1.110000 ( 1.121152)
f(921): 0.600000 0.000000 0.600000 ( 0.604363)
g(921): 1.140000 0.010000 1.150000 ( 1.153237)
f(941): 0.610000 0.010000 0.620000 ( 0.618322)
g(941): 1.200000 0.000000 1.200000 ( 1.203676)
f(961): 0.640000 0.010000 0.650000 ( 0.656267)
g(961): 1.250000 0.010000 1.260000 ( 1.277296)
f(981): 0.650000 0.000000 0.650000 ( 0.658052)
g(981): 1.100000 0.020000 1.120000 ( 1.127124)
f(1001): 0.660000 0.000000 0.660000 ( 0.675562)
g(1001): 1.240000 0.010000 1.250000 ( 1.250755)
sum(f): 17.120000 0.140000 17.260000 ( 17.599892)
sum(g): 26.980000 0.230000 27.210000 ( 27.599565)
And here you can see the plotted output. As you can see both functions are (roughly) linear (that's good); but the function g executes considerably slower as the input size increases. This means that for large numbers of N, it takes longer to parse a bibliography containing N entries than to parse N individual entries. This is because the Bibliography class analyses stuff like resolving cross-references etc. Anyway, looking at the graph shows us that this should be optimized going forward. ;-)
I just tried doing this. First, I had to modify lib/bibtex.rb to add require 'rubygems', otherwise it chokes on require 'multi_json' (I guess macruby doesn't have automatic rubygems support.
Afterwards, it runs fine. However, the "problem" is that the output for MRI (1.9.2p180) and MacRuby (0.12, Ruby 1.9.2) are extremely similar. Here is the output for MRI:
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.000622)
g(1): 0.000000 0.000000 0.000000 ( 0.000403)
f(21): 0.010000 0.000000 0.010000 ( 0.011540)
g(21): 0.010000 0.000000 0.010000 ( 0.007310)
f(41): 0.020000 0.000000 0.020000 ( 0.018733)
g(41): 0.020000 0.000000 0.020000 ( 0.017633)
f(61): 0.030000 0.000000 0.030000 ( 0.029408)
g(61): 0.020000 0.000000 0.020000 ( 0.024942)
f(81): 0.040000 0.000000 0.040000 ( 0.037122)
g(81): 0.040000 0.000000 0.040000 ( 0.039202)
f(101): 0.040000 0.000000 0.040000 ( 0.046273)
g(101): 0.050000 0.000000 0.050000 ( 0.047190)
f(121): 0.060000 0.000000 0.060000 ( 0.056533)
g(121): 0.060000 0.000000 0.060000 ( 0.060450)
f(141): 0.060000 0.000000 0.060000 ( 0.064366)
g(141): 0.070000 0.000000 0.070000 ( 0.073773)
f(161): 0.080000 0.000000 0.080000 ( 0.078557)
g(161): 0.090000 0.000000 0.090000 ( 0.088499)
f(181): 0.080000 0.000000 0.080000 ( 0.081382)
g(181): 0.090000 0.000000 0.090000 ( 0.089241)
f(201): 0.090000 0.000000 0.090000 ( 0.087167)
g(201): 0.090000 0.000000 0.090000 ( 0.097179)
f(221): 0.100000 0.000000 0.100000 ( 0.092566)
g(221): 0.110000 0.000000 0.110000 ( 0.111960)
f(241): 0.100000 0.010000 0.110000 ( 0.105829)
g(241): 0.120000 0.000000 0.120000 ( 0.121754)
f(261): 0.120000 0.000000 0.120000 ( 0.113211)
g(261): 0.130000 0.000000 0.130000 ( 0.135470)
f(281): 0.120000 0.000000 0.120000 ( 0.118759)
g(281): 0.150000 0.000000 0.150000 ( 0.146476)
f(301): 0.130000 0.000000 0.130000 ( 0.134218)
g(301): 0.170000 0.000000 0.170000 ( 0.168745)
f(321): 0.140000 0.000000 0.140000 ( 0.140407)
g(321): 0.180000 0.000000 0.180000 ( 0.183246)
f(341): 0.150000 0.000000 0.150000 ( 0.147753)
g(341): 0.190000 0.000000 0.190000 ( 0.189896)
f(361): 0.150000 0.000000 0.150000 ( 0.153986)
g(361): 0.210000 0.000000 0.210000 ( 0.204915)
f(381): 0.160000 0.000000 0.160000 ( 0.166869)
g(381): 0.200000 0.010000 0.210000 ( 0.202102)
f(401): 0.180000 0.000000 0.180000 ( 0.182246)
g(401): 0.230000 0.000000 0.230000 ( 0.229675)
f(421): 0.190000 0.000000 0.190000 ( 0.186329)
g(421): 0.230000 0.000000 0.230000 ( 0.234171)
f(441): 0.200000 0.000000 0.200000 ( 0.196082)
g(441): 0.260000 0.000000 0.260000 ( 0.266406)
f(461): 0.200000 0.000000 0.200000 ( 0.203138)
g(461): 0.260000 0.000000 0.260000 ( 0.258192)
f(481): 0.210000 0.010000 0.220000 ( 0.208168)
g(481): 0.270000 0.000000 0.270000 ( 0.271574)
f(501): 0.210000 0.000000 0.210000 ( 0.212898)
g(501): 0.290000 0.000000 0.290000 ( 0.294501)
f(521): 0.230000 0.000000 0.230000 ( 0.224249)
g(521): 0.330000 0.000000 0.330000 ( 0.332794)
f(541): 0.230000 0.010000 0.240000 ( 0.236670)
g(541): 0.310000 0.000000 0.310000 ( 0.307345)
f(561): 0.240000 0.000000 0.240000 ( 0.245936)
g(561): 0.330000 0.000000 0.330000 ( 0.329643)
f(581): 0.260000 0.000000 0.260000 ( 0.254318)
g(581): 0.330000 0.000000 0.330000 ( 0.339464)
f(601): 0.260000 0.000000 0.260000 ( 0.261145)
g(601): 0.380000 0.010000 0.390000 ( 0.376544)
f(621): 0.260000 0.000000 0.260000 ( 0.263120)
g(621): 0.390000 0.000000 0.390000 ( 0.390627)
f(641): 0.270000 0.000000 0.270000 ( 0.271359)
g(641): 0.410000 0.000000 0.410000 ( 0.407785)
f(661): 0.280000 0.000000 0.280000 ( 0.283797)
g(661): 0.410000 0.010000 0.420000 ( 0.410276)
f(681): 0.290000 0.000000 0.290000 ( 0.299164)
g(681): 0.430000 0.000000 0.430000 ( 0.422964)
f(701): 0.290000 0.000000 0.290000 ( 0.299465)
g(701): 0.440000 0.000000 0.440000 ( 0.440954)
f(721): 0.310000 0.010000 0.320000 ( 0.312829)
g(721): 0.480000 0.000000 0.480000 ( 0.474714)
f(741): 0.320000 0.000000 0.320000 ( 0.327411)
g(741): 0.500000 0.000000 0.500000 ( 0.494715)
f(761): 0.320000 0.000000 0.320000 ( 0.323732)
g(761): 0.500000 0.010000 0.510000 ( 0.505016)
f(781): 0.340000 0.000000 0.340000 ( 0.343639)
g(781): 0.520000 0.000000 0.520000 ( 0.522267)
f(801): 0.350000 0.000000 0.350000 ( 0.350007)
g(801): 0.550000 0.010000 0.560000 ( 0.552032)
f(821): 0.350000 0.000000 0.350000 ( 0.352463)
g(821): 0.580000 0.000000 0.580000 ( 0.579625)
f(841): 0.370000 0.000000 0.370000 ( 0.374373)
g(841): 0.600000 0.000000 0.600000 ( 0.600615)
f(861): 0.360000 0.010000 0.370000 ( 0.364205)
g(861): 0.620000 0.000000 0.620000 ( 0.617375)
f(881): 0.380000 0.000000 0.380000 ( 0.380935)
g(881): 0.620000 0.000000 0.620000 ( 0.632298)
f(901): 0.400000 0.010000 0.410000 ( 0.394550)
g(901): 0.660000 0.000000 0.660000 ( 0.661010)
f(921): 0.400000 0.000000 0.400000 ( 0.406453)
g(921): 0.690000 0.000000 0.690000 ( 0.688062)
f(941): 0.400000 0.010000 0.410000 ( 0.405330)
g(941): 0.690000 0.000000 0.690000 ( 0.691564)
f(961): 0.430000 0.000000 0.430000 ( 0.429863)
g(961): 0.740000 0.010000 0.750000 ( 0.750301)
f(981): 0.420000 0.000000 0.420000 ( 0.418148)
g(981): 0.760000 0.000000 0.760000 ( 0.770694)
f(1001): 0.440000 0.000000 0.440000 ( 0.436495)
g(1001): 0.800000 0.010000 0.810000 ( 0.805949)
sum(f): 11.070000 0.070000 11.140000 ( 11.133818)
sum(g): 16.610000 0.070000 16.680000 ( 16.669537)
and here is the output for MacRuby
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.000597)
g(1): 0.000000 0.000000 0.000000 ( 0.000403)
f(21): 0.010000 0.000000 0.010000 ( 0.008394)
g(21): 0.010000 0.000000 0.010000 ( 0.010537)
f(41): 0.020000 0.000000 0.020000 ( 0.018743)
g(41): 0.020000 0.010000 0.030000 ( 0.017602)
f(61): 0.030000 0.000000 0.030000 ( 0.026597)
g(61): 0.020000 0.000000 0.020000 ( 0.028474)
f(81): 0.040000 0.000000 0.040000 ( 0.037569)
g(81): 0.040000 0.000000 0.040000 ( 0.034700)
f(101): 0.040000 0.000000 0.040000 ( 0.047319)
g(101): 0.050000 0.000000 0.050000 ( 0.046107)
f(121): 0.060000 0.000000 0.060000 ( 0.055845)
g(121): 0.060000 0.000000 0.060000 ( 0.062644)
f(141): 0.070000 0.000000 0.070000 ( 0.070522)
g(141): 0.070000 0.000000 0.070000 ( 0.067645)
f(161): 0.070000 0.000000 0.070000 ( 0.079994)
g(161): 0.090000 0.000000 0.090000 ( 0.081004)
f(181): 0.080000 0.000000 0.080000 ( 0.082702)
g(181): 0.080000 0.000000 0.080000 ( 0.086279)
f(201): 0.090000 0.000000 0.090000 ( 0.087983)
g(201): 0.100000 0.000000 0.100000 ( 0.102436)
f(221): 0.100000 0.000000 0.100000 ( 0.097282)
g(221): 0.110000 0.000000 0.110000 ( 0.108936)
f(241): 0.110000 0.000000 0.110000 ( 0.107562)
g(241): 0.120000 0.000000 0.120000 ( 0.123231)
f(261): 0.120000 0.000000 0.120000 ( 0.117738)
g(261): 0.130000 0.000000 0.130000 ( 0.138497)
f(281): 0.130000 0.000000 0.130000 ( 0.121723)
g(281): 0.160000 0.000000 0.160000 ( 0.163618)
f(301): 0.140000 0.010000 0.150000 ( 0.146252)
g(301): 0.170000 0.000000 0.170000 ( 0.167022)
f(321): 0.150000 0.000000 0.150000 ( 0.149100)
g(321): 0.160000 0.000000 0.160000 ( 0.166145)
f(341): 0.150000 0.000000 0.150000 ( 0.147195)
g(341): 0.190000 0.000000 0.190000 ( 0.187156)
f(361): 0.150000 0.000000 0.150000 ( 0.155126)
g(361): 0.190000 0.000000 0.190000 ( 0.190667)
f(381): 0.170000 0.000000 0.170000 ( 0.166908)
g(381): 0.210000 0.000000 0.210000 ( 0.211386)
f(401): 0.170000 0.000000 0.170000 ( 0.176037)
g(401): 0.220000 0.010000 0.230000 ( 0.218084)
f(421): 0.180000 0.000000 0.180000 ( 0.184280)
g(421): 0.230000 0.000000 0.230000 ( 0.227744)
f(441): 0.190000 0.000000 0.190000 ( 0.193176)
g(441): 0.260000 0.000000 0.260000 ( 0.258051)
f(461): 0.210000 0.000000 0.210000 ( 0.206932)
g(461): 0.250000 0.000000 0.250000 ( 0.254047)
f(481): 0.210000 0.000000 0.210000 ( 0.207435)
g(481): 0.290000 0.000000 0.290000 ( 0.291690)
f(501): 0.210000 0.010000 0.220000 ( 0.218241)
g(501): 0.290000 0.000000 0.290000 ( 0.284853)
f(521): 0.220000 0.000000 0.220000 ( 0.225706)
g(521): 0.320000 0.000000 0.320000 ( 0.321912)
f(541): 0.230000 0.000000 0.230000 ( 0.232489)
g(541): 0.310000 0.000000 0.310000 ( 0.308900)
f(561): 0.250000 0.000000 0.250000 ( 0.247742)
g(561): 0.350000 0.010000 0.360000 ( 0.347809)
f(581): 0.250000 0.000000 0.250000 ( 0.251443)
g(581): 0.420000 0.000000 0.420000 ( 0.430320)
f(601): 0.290000 0.000000 0.290000 ( 0.287576)
g(601): 0.450000 0.000000 0.450000 ( 0.450228)
f(621): 0.270000 0.000000 0.270000 ( 0.274842)
g(621): 0.390000 0.010000 0.400000 ( 0.394948)
f(641): 0.280000 0.000000 0.280000 ( 0.279406)
g(641): 0.420000 0.000000 0.420000 ( 0.417165)
f(661): 0.290000 0.000000 0.290000 ( 0.300725)
g(661): 0.450000 0.000000 0.450000 ( 0.451651)
f(681): 0.300000 0.010000 0.310000 ( 0.297297)
g(681): 0.470000 0.000000 0.470000 ( 0.476887)
f(701): 0.340000 0.000000 0.340000 ( 0.337852)
g(701): 0.480000 0.000000 0.480000 ( 0.482324)
f(721): 0.310000 0.010000 0.320000 ( 0.314285)
g(721): 0.540000 0.000000 0.540000 ( 0.546801)
f(741): 0.340000 0.000000 0.340000 ( 0.337292)
g(741): 0.490000 0.010000 0.500000 ( 0.497936)
f(761): 0.380000 0.000000 0.380000 ( 0.381007)
g(761): 0.610000 0.000000 0.610000 ( 0.611881)
f(781): 0.350000 0.010000 0.360000 ( 0.348679)
g(781): 0.580000 0.000000 0.580000 ( 0.585341)
f(801): 0.340000 0.000000 0.340000 ( 0.345329)
g(801): 0.610000 0.000000 0.610000 ( 0.620062)
f(821): 0.360000 0.010000 0.370000 ( 0.355276)
g(821): 0.540000 0.000000 0.540000 ( 0.545332)
f(841): 0.360000 0.000000 0.360000 ( 0.366477)
g(841): 0.630000 0.000000 0.630000 ( 0.631926)
f(861): 0.440000 0.010000 0.450000 ( 0.448708)
g(861): 0.650000 0.000000 0.650000 ( 0.648266)
f(881): 0.380000 0.000000 0.380000 ( 0.377770)
g(881): 0.580000 0.010000 0.590000 ( 0.591761)
f(901): 0.390000 0.000000 0.390000 ( 0.390795)
g(901): 0.650000 0.000000 0.650000 ( 0.653867)
f(921): 0.390000 0.010000 0.400000 ( 0.392912)
g(921): 0.710000 0.000000 0.710000 ( 0.709803)
f(941): 0.390000 0.000000 0.390000 ( 0.396474)
g(941): 0.710000 0.000000 0.710000 ( 0.711333)
f(961): 0.410000 0.010000 0.420000 ( 0.411303)
g(961): 0.750000 0.000000 0.750000 ( 0.756501)
f(981): 0.420000 0.000000 0.420000 ( 0.424169)
g(981): 0.800000 0.010000 0.810000 ( 0.793700)
f(1001): 0.410000 0.000000 0.410000 ( 0.419748)
g(1001): 0.760000 0.010000 0.770000 ( 0.762972)
sum(f): 11.290000 0.090000 11.380000 ( 11.356555)
sum(g): 17.190000 0.080000 17.270000 ( 17.278584)
I reran my testscript, thinking that perhaps there had been an update to MacRuby, bibtex-ruby, or one of the other libraries used, which caused a speedup, but the time difference between parsing a bibliography of 65 publications is still more than 150x.
Stian, thanks for getting back to this!
It is generally frowned upon in the Ruby community to force require rubygems in a library. If you don't want to edit the file, you can just run your script with ruby -rrubygems
or, alternatively, add the require to the benchmark script.
Regarding the results I can see that a) you have a much faster computer than I ;-) and b) you're, right that the results are extremely similar. This means that the parser is not significantly slower in MacRuby and therefore we're losing performance somewhere else.
Just to make sure, in your test script above you're printing the bibliography (the p b
); could you change that to something simpler, for instance, just print b.length
or something like that. This is just to make sure that it is opening the bibliography which is in fact slower and not printing it.
Since we cannot use ruby-prof we'll have to adapt the benchmark script to our needs.
The rubygems thing is no biggie. I saw somewhere that macruby removed it to remove overhead. Anyway I can fix that easily.
I removed the print line entirely, and still
[folders2web] time ruby speedtest.rb 4:16:02 ☁ master ☂ ⚡ ✭ ruby speedtest.rb 0.17s user 0.05s system 70% cpu 0.303 total [folders2web] time macruby speedtest.rb 4:16:38 ☁ master ☂ ⚡ ✭ macruby speedtest.rb 23.05s user 0.66s system 118% cpu 20.008 total
I have a 2.7 GhZ i7 MacBook Pro 13" :)
Stian
More numbers:
Only parsing bibtex-file, not printing MRI: 0.17 MacRuby: 23.05
Only parsing bibtex-file with 3 entries MRI: 0.15 MacRuby: 3.45
Only requiring rubygems and bibtex-ruby, not doing anything else MRI: 0.13 MacRuby: 2.74
Only printing "Hi", with no libraries required MRI: 0.06 MacRuby: 0.01
I did the last to check that there wasn't a longer startup time required for MacRuby, which it wasn't. It does take significantly longer to load bibtex-ruby, but even then, the time it spends on even just 3 entries is significantly longer.
Please try the following:
In benchmark.rb replace the entry with the contents of your bibtex file (the 3 entries). That is:
input = <<-END
< ... the 3 entries ... >
END
Next, change n, k to:
n, k = 51, 10
And rerun the test, make a note of the Results.
Then, change the two lines inside the f/g tests from BibTeX::Parser.new.parse(input)
to BibTeX.parse(input)
, and try again.
Did as you asked. Output before I changed the parse lines:
[test] macruby ./benchmark.rb 4:41:17 user system total real f(1): 0.550000 0.010000 0.560000 ( 0.521162) g(1): 0.200000 0.000000 0.200000 ( 0.174278) f(11): 2.130000 0.050000 2.180000 ( 1.855182) g(11): 6.640000 0.120000 6.760000 ( 5.462189) f(21): 4.390000 0.090000 4.480000 ( 3.594807) g(21): 21.710000 0.700000 22.410000 ( 17.545464) f(31): 6.860000 0.130000 6.990000 ( 5.537885) g(31): 45.450000 2.160000 47.610000 ( 36.967703) f(41): 9.000000 0.170000 9.170000 ( 7.142162) g(41): 77.760000 4.220000 81.980000 ( 62.952831) f(51): 11.680000 0.230000 11.910000 ( 9.023000) g(51):121.490000 6.940000 128.430000 ( 99.794021) sum(f): 34.610000 0.680000 35.290000 ( 27.674197) sum(g):273.250000 14.140000 287.390000 (222.896485)
And here it is with the two lines changed
user system total real
f(1): 0.590000 0.000000 0.590000 ( 0.569724)
g(1): 0.210000 0.010000 0.220000 ( 0.176379)
f(11): 2.330000 0.050000 2.380000 ( 2.051605)
g(11): 6.690000 0.130000 6.820000 ( 5.505299)
f(21): 4.580000 0.090000 4.670000 ( 3.763134)
g(21): 21.850000 0.700000 22.550000 ( 17.717141)
f(31): 6.750000 0.130000 6.880000 ( 5.419445)
g(31): 45.630000 2.170000 47.800000 ( 37.105453)
f(41): 9.050000 0.170000 9.220000 ( 7.159262)
g(41): 77.960000 4.240000 82.200000 ( 63.075123)
f(51): 11.890000 0.240000 12.130000 ( 9.149539)
g(51):119.660000 6.780000 126.440000 ( 96.518764)
sum(f): 35.190000 0.680000 35.870000 ( 28.112709)
sum(g):272.000000 14.030000 286.030000 (220.098160)
Interesting. Could you post the entries you're using as input? Are there by any chance cross references?
Plus, as a reference, could you run the tests above with MRI too?
input = <<-END %% This BibTeX bibliography file was created using BibDesk. %% http://bibdesk.sourceforge.net/
%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400
%% Saved with string encoding Unicode (UTF-8)
@inproceedings{vladoiu2011open, Author = {Vladoiu, M.}, Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th}, Date-Added = {2011-09-13 13:34:16 +0000}, Date-Modified = {2011-09-13 13:34:46 +0000}, Keywords = {cites-me; OER/open ed}, Organization = {IEEE}, Pages = {1--6}, Title = {Open courseware initiatives-after 10 years}, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAYgAAAAAAYgAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivBN2bGFkb2l1MjAxMW9wZW4ucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALu7lypTT2gAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKlQwaAAAAAQAMABLivAAAAF8AAABXAAIAM0hvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAdmxhZG9pdTIwMTFvcGVuLnBkZgAADgAoABMAdgBsAGEAZABvAGkAdQAyADAAMQAxAG8AcABlAG4ALgBwAGQAZgAPAAoABABIAG8AbQBlABIALC9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay92bGFkb2l1MjAxMW9wZW4ucGRmABMADS9Wb2x1bWVzL0hvbWUAABUAAgAT//8AAIAF0hwdHh9YJGNsYXNzZXNaJGNsYXNzbmFtZaMfICFdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3RfECsuLi8uLi9Eb2N1bWVudHMvQmliZGVzay92bGFkb2l1MjAxMW9wZW4ucGRm0hwdJCWiJSFcTlNEaWN0aW9uYXJ5EgABhqBfEA9OU0tleWVkQXJjaGl2ZXIACAARABYAHwAoADIANQA6ADwARQBLAFIAXQBlAGwAbwBxAHMAdgB4AHoAfACGAJMAmACgAiwCLgIzAjwCRwJLAlkCYAJpApcCnAKfAqwCsQAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAALD}}
@inproceedings{valjataga2011open, Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.}, Booktitle = {Proceedings of the 4th International Network-Based Education 2011 Conference The Social Media in the Middle of Nowhere}, Date-Added = {2011-08-09 10:54:02 +0000}, Date-Modified = {2011-08-09 11:20:02 +0000}, Editor = {Ruokamo, H. and Eriksson, M. and Pekkala, L. and Vuojärvi, H.}, Keywords = {open courses}, Organization = {Rovaniemi: University of Lapland}, Pages = {68-7}, Title = {Open Online Courses: Responding to Design Challenges}, Year = {2011}, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAZAAAAAAAZAAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivBV2YWxqYXRhZ2EyMDExb3Blbi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALLykymaMLgAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKZsRuAAAAAQAMABLivAAAAF8AAABXAAIANUhvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAdmFsamF0YWdhMjAxMW9wZW4ucGRmAAAOACwAFQB2AGEAbABqAGEAdABhAGcAYQAyADAAMQAxAG8AcABlAG4ALgBwAGQAZgAPAAoABABIAG8AbQBlABIALi9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay92YWxqYXRhZ2EyMDExb3Blbi5wZGYAEwANL1ZvbHVtZXMvSG9tZQAAFQACABP//wAAgAXSHB0eH1gkY2xhc3Nlc1okY2xhc3NuYW1lox8gIV1OU011dGFibGVEYXRhVk5TRGF0YVhOU09iamVjdF8QLS4uLy4uL0RvY3VtZW50cy9CaWJkZXNrL3ZhbGphdGFnYTIwMTFvcGVuLnBkZtIcHSQloiUhXE5TRGljdGlvbmFyeRIAAYagXxAPTlNLZXllZEFyY2hpdmVyAAgAEQAWAB8AKAAyADUAOgA8AEUASwBSAF0AZQBsAG8AcQBzAHYAeAB6AHwAhgCTAJgAoAI0AjYCOwJEAk8CUwJhAmgCcQKhAqYCqQK2ArsAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAACzQ==}}
@unpublished{resendes2011comprehensive, Author = {Resendes, Monica}, Date-Added = {2011-08-08 01:10:03 +0000}, Date-Modified = {2011-08-08 01:10:38 +0000}, Title = {Comprehensive Exam}, Year = {2011}, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAbAAAAAAAbAAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivB1yZXNlbmRlczIwMTFjb21wcmVoZW5zaXZlLnBkZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALKEbymSvkAAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKZOfQAAAAAQAMABLivAAAAF8AAABXAAIAPUhvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAcmVzZW5kZXMyMDExY29tcHJlaGVuc2l2ZS5wZGYAAA4APAAdAHIAZQBzAGUAbgBkAGUAcwAyADAAMQAxAGMAbwBtAHAAcgBlAGgAZQBuAHMAaQB2AGUALgBwAGQAZgAPAAoABABIAG8AbQBlABIANi9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay9yZXNlbmRlczIwMTFjb21wcmVoZW5zaXZlLnBkZgATAA0vVm9sdW1lcy9Ib21lAAAVAAIAE///AACABdIcHR4fWCRjbGFzc2VzWiRjbGFzc25hbWWjHyAhXU5TTXV0YWJsZURhdGFWTlNEYXRhWE5TT2JqZWN0XxA1Li4vLi4vRG9jdW1lbnRzL0JpYmRlc2svcmVzZW5kZXMyMDExY29tcHJlaGVuc2l2ZS5wZGbSHB0kJaIlIVxOU0RpY3Rpb25hcnkSAAGGoF8QD05TS2V5ZWRBcmNoaXZlcgAIABEAFgAfACgAMgA1ADoAPABFAEsAUgBdAGUAbABvAHEAcwB2AHgAegB8AIYAkwCYAKACVAJWAlsCZAJvAnMCgQKIApECyQLOAtEC3gLjAAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAAvU=}}
END
[test] ruby -KU benchmark.rb
11:40:00 user system total real f(1): 0.000000 0.000000 0.000000 ( 0.002150) g(1): 0.010000 0.000000 0.010000 ( 0.001420) f(11): 0.010000 0.000000 0.010000 ( 0.016743) g(11): 0.020000 0.000000 0.020000 ( 0.017404) f(21): 0.030000 0.000000 0.030000 ( 0.030529) g(21): 0.040000 0.000000 0.040000 ( 0.036849) f(31): 0.040000 0.000000 0.040000 ( 0.046699) g(31): 0.050000 0.000000 0.050000 ( 0.044248) f(41): 0.060000 0.000000 0.060000 ( 0.063777) g(41): 0.070000 0.000000 0.070000 ( 0.064466) f(51): 0.070000 0.000000 0.070000 ( 0.077645) g(51): 0.080000 0.000000 0.080000 ( 0.078850) sum(f): 0.210000 0.000000 0.210000 ( 0.237543) sum(g): 0.270000 0.000000 0.270000 ( 0.243237)
[test] ruby -KU benchmark.rb
11:39:53 user system total real f(1): 0.000000 0.000000 0.000000 ( 0.002288) g(1): 0.000000 0.000000 0.000000 ( 0.001501) f(11): 0.020000 0.000000 0.020000 ( 0.018536) g(11): 0.020000 0.010000 0.030000 ( 0.021100) f(21): 0.030000 0.000000 0.030000 ( 0.033027) g(21): 0.040000 0.000000 0.040000 ( 0.035162) f(31): 0.050000 0.000000 0.050000 ( 0.050134) g(31): 0.050000 0.000000 0.050000 ( 0.050362) f(41): 0.070000 0.000000 0.070000 ( 0.070757) g(41): 0.070000 0.000000 0.070000 ( 0.074642) f(51): 0.090000 0.000000 0.090000 ( 0.087376) g(51): 0.090000 0.000000 0.090000 ( 0.096798) sum(f): 0.260000 0.000000 0.260000 ( 0.262118) sum(g): 0.270000 0.010000 0.280000 ( 0.279564)
Hmm, I think we're getting closer. The difference seems to be triggered by something in your entries. Could you try: stripping the comments and the file attachment fields (and if that does not make a difference) other fields until the execution time comes down?
I suspect the problem has to do with parsing the file attachment entries which are quite long. If you could confirm this, I'll come up with a benchmark file that will help us pinpoint what's causing the problem in parser or lexer.
New input:
input = <<-END %% This BibTeX bibliography file was created using BibDesk. %% http://bibdesk.sourceforge.net/
%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400
%% Saved with string encoding Unicode (UTF-8)
@inproceedings{vladoiu2011open, Author = {Vladoiu, M.}, Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th}, Pages = {1--6}, Title = {Open courseware initiatives-after 10 years}}
@inproceedings{valjataga2011open, Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.}, Booktitle = {Proceedings of the 4th International Network-Based Education 2011 Conference The Social Media in the Middle of Nowhere}, Editor = {Ruokamo, H. and Eriksson, M. and Pekkala, L. and Vuojärvi, H.}, Organization = {Rovaniemi: University of Lapland}, Pages = {68-7}, Title = {Open Online Courses: Responding to Design Challenges}, Year = {2011}}
@unpublished{resendes2011comprehensive, Author = {Resendes, Monica}, Title = {Comprehensive Exam}, Year = {2011}}
END
new result
[test] macruby benchmark.rb 12:11:26 user system total real f(1): 0.420000 0.000000 0.420000 ( 0.423830) g(1): 0.130000 0.000000 0.130000 ( 0.110229) f(11): 1.260000 0.030000 1.290000 ( 1.171771) g(11): 1.840000 0.040000 1.880000 ( 1.622123) f(21): 2.190000 0.040000 2.230000 ( 2.051199) g(21): 4.630000 0.100000 4.730000 ( 3.795607) f(31): 3.100000 0.060000 3.160000 ( 2.861643) g(31): 9.500000 0.190000 9.690000 ( 7.664800) f(41): 5.180000 0.090000 5.270000 ( 4.807926) g(41): 15.220000 0.270000 15.490000 ( 12.194606) f(51): 6.360000 0.130000 6.490000 ( 5.832457) g(51): 20.350000 0.350000 20.700000 ( 15.984544) sum(f): 18.510000 0.350000 18.860000 ( 17.148826) sum(g): 51.670000 0.950000 52.620000 ( 41.371907)
(with Parser.new.parse)
Another attempt
input: input = <<-END %% This BibTeX bibliography file was created using BibDesk. %% http://bibdesk.sourceforge.net/
%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400
%% Saved with string encoding Unicode (UTF-8)
@inproceedings{vladoiu2011open, Author = {Vladoiu, M.}, Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th}, Pages = {1--6}, Title = {Open courseware initiatives-after 10 years}}
@inproceedings{valjataga2011open, Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.}, Title = {Open Online Courses: Responding to Design Challenges}, Year = {2011}}
@unpublished{resendes2011comprehensive, Author = {Resendes, Monica}, Title = {Comprehensive Exam}, Year = {2011}}
END
result [test] macruby benchmark.rb 12:12:52 user system total real f(1): 0.390000 0.000000 0.390000 ( 0.396010) g(1): 0.070000 0.010000 0.080000 ( 0.067129) f(11): 0.800000 0.010000 0.810000 ( 0.732907) g(11): 1.060000 0.020000 1.080000 ( 0.963519) f(21): 1.550000 0.030000 1.580000 ( 1.474796) g(21): 3.010000 0.060000 3.070000 ( 2.498192) f(31): 2.220000 0.040000 2.260000 ( 2.080213) g(31): 5.210000 0.100000 5.310000 ( 4.185519) f(41): 3.110000 0.050000 3.160000 ( 2.877646) g(41): 8.230000 0.150000 8.380000 ( 6.489485) f(51): 3.640000 0.060000 3.700000 ( 3.352817) g(51): 11.750000 0.210000 11.960000 ( 9.281206) sum(f): 11.710000 0.190000 11.900000 ( 10.914390) sum(g): 29.330000 0.550000 29.880000 ( 23.485050)
Can you keep trying removing fields from your entries (also you may want to try and remove the comments at the beginning of the file) and rerunning the benchmark? The thing is, when you ran my original test with this entry there was hardly any difference in speed between MRI and MacRuby. Using your entries there is; so there has got to be a difference that is triggered by some detail in your entries and we need to try and locate that detail.
This is normally quite easy with ruby-prof because it gives you a summary of the time spent in individual methods, but for now our best bet is to consecutively remove data from your entries until there is a big jump in execution time.
My test entry was a one book with author, address, date-added, date-modified, keywords, publisher, series, title and year fields. Now the odd thing is that your entries don't seem to contain anything peculiar. Perhaps you start by looking at each entry individually; use my test entry as a reference: each of your entries, when executed individually, should not take considerably longer than mine. If any of them do take longer, we can begin by removing individual fields to see which fields are causing the performance penalty.
I am sorry, I must have made a mistake with the original run of benchmark, which might have set us back a bit. Rerunning it with the original settings in benchmark (as supplied with the gem) shows a huge difference between MRI and MacRuby. Not sure how I managed to get the numbers previously.
First baseline load numbers for a file which only contains "require bibtex"
MRI: 0.11 MacRuby: 2.76
(and for fun, require 'citeproc')
MRI: 0.33 MacRuby: 6.52
Then the benchmarks (using the one Pickaxe publication)
MRI user system total real f(1): 0.000000 0.000000 0.000000 ( 0.000743) g(1): 0.000000 0.000000 0.000000 ( 0.000504) f(11): 0.010000 0.000000 0.010000 ( 0.008548) g(11): 0.010000 0.000000 0.010000 ( 0.008101) f(21): 0.010000 0.000000 0.010000 ( 0.007892) g(21): 0.010000 0.000000 0.010000 ( 0.013536) f(31): 0.020000 0.000000 0.020000 ( 0.017215) g(31): 0.010000 0.000000 0.010000 ( 0.010945) f(41): 0.020000 0.000000 0.020000 ( 0.021488) g(41): 0.020000 0.000000 0.020000 ( 0.017821) f(51): 0.030000 0.000000 0.030000 ( 0.029464) g(51): 0.020000 0.000000 0.020000 ( 0.022280) sum(f): 0.090000 0.000000 0.090000 ( 0.085350) sum(g): 0.070000 0.000000 0.070000 ( 0.073187)
MacRuby: user system total real f(1): 0.390000 0.000000 0.390000 ( 0.369663) g(1): 0.050000 0.000000 0.050000 ( 0.046697) f(11): 0.560000 0.010000 0.570000 ( 0.520116) g(11): 0.590000 0.010000 0.600000 ( 0.532049) f(21): 1.020000 0.020000 1.040000 ( 0.965179) g(21): 1.200000 0.030000 1.230000 ( 1.031975) f(31): 1.510000 0.020000 1.530000 ( 1.427880) g(31): 1.860000 0.040000 1.900000 ( 1.525376) f(41): 2.040000 0.030000 2.070000 ( 1.904556) g(41): 2.590000 0.060000 2.650000 ( 2.085736) f(51): 2.590000 0.050000 2.640000 ( 2.436998) g(51): 3.250000 0.080000 3.330000 ( 2.611425) sum(f): 8.110000 0.130000 8.240000 ( 7.624393) sum(g): 9.540000 0.220000 9.760000 ( 7.833259)
Ha, so whilst this sidetracked us a little bit, at least it makes a lot more sense (would have been weird because there was nothing peculiar about your entries really). Anyway, so next things to try are:
Parser.new(:parse_names => false)
lexer = BibTeX::Lexer.new
and in the two benchmarking functions uncomment the two lexer lines.Sorry for the sidetracking! :)
With name parser
user system total real
f(1): 0.320000 0.000000 0.320000 ( 0.312330)
g(1): 0.040000 0.000000 0.040000 ( 0.040179)
f(11): 0.540000 0.010000 0.550000 ( 0.506932)
g(11): 0.580000 0.020000 0.600000 ( 0.525592)
f(21): 0.940000 0.010000 0.950000 ( 0.906421)
g(21): 1.100000 0.030000 1.130000 ( 0.952472)
f(31): 1.480000 0.020000 1.500000 ( 1.412313)
g(31): 1.710000 0.040000 1.750000 ( 1.406194)
f(41): 1.930000 0.030000 1.960000 ( 1.850456)
g(41): 2.330000 0.050000 2.380000 ( 1.901826)
f(51): 2.360000 0.040000 2.400000 ( 2.257148)
g(51): 2.940000 0.070000 3.010000 ( 2.367208)
sum(f): 7.570000 0.110000 7.680000 ( 7.245601)
sum(g): 8.700000 0.210000 8.910000 ( 7.193470)
With lexer only:
user system total real
f(1): 0.180000 0.000000 0.180000 ( 0.167365)
g(1): 0.040000 0.000000 0.040000 ( 0.044349)
f(11): 0.530000 0.010000 0.540000 ( 0.486000)
g(11): 0.540000 0.020000 0.560000 ( 0.490285)
f(21): 0.940000 0.010000 0.950000 ( 0.908252)
g(21): 1.060000 0.030000 1.090000 ( 0.920382)
f(31): 1.430000 0.020000 1.450000 ( 1.361777)
g(31): 1.680000 0.040000 1.720000 ( 1.379683)
f(41): 1.900000 0.030000 1.930000 ( 1.836740)
g(41): 2.240000 0.060000 2.300000 ( 1.829948)
f(51): 2.330000 0.030000 2.360000 ( 2.254141)
g(51): 2.900000 0.060000 2.960000 ( 2.346938)
sum(f): 7.310000 0.100000 7.410000 ( 7.014275)
sum(g): 8.460000 0.210000 8.670000 ( 7.011585)
(I take it you meant to say, without name parsing above, right?)
Right, we're getting closer now. Judging on the basis of these results it seems that the lexical analysis is responsible for slow performance on MacRuby. In a way, that's great, because the lexer needs to rewritten anyway (see #26).
I'm not sure I'll be able to write the single pass lexer in the next few days, but I'll try to take a quick stab at it and also come up with a better benchmarking script to allow us to see what part of the lexical analysis consumes so much time.
I've never used Instruments, and there is no documentation available. I posted on StackOverflow asking for help.
Hi,
Come here from the MacRuby mail list.
To profile MacRuby code, use DTrace, http://www.macruby.org/blog/2009/11/17/macruby05b2.html shows an example.
MacRuby source tree is needed, and DTrace scripts can be found in sample-macruby/DTrace/
.
Open two terminal windows, in the first one type,
$ your-benchmark-script.rb &
,
remember the PID, and then in second window type,
$ sudo dtrace -s path-to-macruby-source/sample-macruby/DTrace/methods_duration.d -p pid
.
You may need to add several seconds pause at the beginning of the benchmark script, so you have time to type in the second window.
The output is long, but "Regexp#initialize" still can be singled out very easily.
It seems that MRI "caches" compiled regexp, while MacRuby doesn't and re-creates every regexp literal constants again and again. Try timing following script, the difference is also huge.
1_000_000.times do
"123" =~ /^[0-9]+$/
end
So caching all regexp literals to ivars might be a viable solution to #37.
Awesome @dche, thanks a lot!
I suggest we try moving the regular expressions to a class instance variable; this should give us a minimal overhead on MRI but simulate caching for MacRuby.
@houshuang could you run your benchmarks against master and give us the numbers?
Doesn't seem like it really helped :(
ruby benchmark.rb 21:33:33 ☁ master ☀ user system total real f(1): 0.000000 0.000000 0.000000 ( 0.000640) g(1): 0.000000 0.000000 0.000000 ( 0.000421) f(21): 0.010000 0.000000 0.010000 ( 0.011741) g(21): 0.010000 0.000000 0.010000 ( 0.007213) f(41): 0.020000 0.000000 0.020000 ( 0.025018) g(41): 0.020000 0.000000 0.020000 ( 0.017973) f(61): 0.030000 0.000000 0.030000 ( 0.029438) g(61): 0.030000 0.000000 0.030000 ( 0.027517) f(81): 0.040000 0.000000 0.040000 ( 0.043976) g(81): 0.050000 0.000000 0.050000 ( 0.047991) f(101): 0.050000 0.000000 0.050000 ( 0.048558) g(101): 0.050000 0.000000 0.050000 ( 0.051856) f(121): 0.060000 0.000000 0.060000 ( 0.063068) g(121): 0.070000 0.000000 0.070000 ( 0.063596) f(141): 0.060000 0.000000 0.060000 ( 0.058472) g(141): 0.060000 0.000000 0.060000 ( 0.062874) f(161): 0.080000 0.000000 0.080000 ( 0.079034) g(161): 0.070000 0.000000 0.070000 ( 0.076821) f(181): 0.090000 0.000000 0.090000 ( 0.084116) g(181): 0.090000 0.000000 0.090000 ( 0.091101) f(201): 0.090000 0.000000 0.090000 ( 0.093750) g(201): 0.100000 0.000000 0.100000 ( 0.100371) f(221): 0.100000 0.000000 0.100000 ( 0.097935) g(221): 0.110000 0.000000 0.110000 ( 0.110750) f(241): 0.100000 0.010000 0.110000 ( 0.105077) g(241): 0.130000 0.000000 0.130000 ( 0.125688) f(261): 0.120000 0.000000 0.120000 ( 0.122992) g(261): 0.140000 0.000000 0.140000 ( 0.138203) f(281): 0.140000 0.000000 0.140000 ( 0.139820) g(281): 0.150000 0.000000 0.150000 ( 0.148650) f(301): 0.150000 0.000000 0.150000 ( 0.153027) g(301): 0.170000 0.000000 0.170000 ( 0.173788) f(321): 0.150000 0.000000 0.150000 ( 0.147638) g(321): 0.180000 0.000000 0.180000 ( 0.183194) f(341): 0.160000 0.000000 0.160000 ( 0.157463) g(341): 0.200000 0.000000 0.200000 ( 0.202536)
[test] macruby benchmark.rb 21:34:36 ☁ master ☂ ⚡ user system total real f(1): 0.380000 0.000000 0.380000 ( 0.374224) g(1): 0.040000 0.000000 0.040000 ( 0.044956) f(21): 1.060000 0.020000 1.080000 ( 0.962675) g(21): 1.100000 0.030000 1.130000 ( 0.990855) f(41): 2.030000 0.040000 2.070000 ( 1.916846) g(41): 2.510000 0.060000 2.570000 ( 2.068640) f(61): 2.960000 0.050000 3.010000 ( 2.783313) g(61): 4.100000 0.100000 4.200000 ( 3.287772) f(81): 4.650000 0.090000 4.740000 ( 4.485962) g(81): 5.630000 0.140000 5.770000 ( 4.453925) f(101): 5.280000 0.090000 5.370000 ( 4.918456) g(101): 7.020000 0.180000 7.200000 ( 5.580300) f(121): 6.240000 0.100000 6.340000 ( 5.785481) g(121): 8.710000 0.220000 8.930000 ( 6.843820) f(141): 7.560000 0.140000 7.700000 ( 6.965423) g(141): 10.810000 0.260000 11.070000 ( 8.324514) f(161): 8.520000 0.160000 8.680000 ( 7.783367) g(161): 12.760000 0.320000 13.080000 ( 9.747104) f(181): 9.820000 0.180000 10.000000 ( 8.946036) g(181): 14.630000 0.440000 15.070000 ( 11.234789) f(201): 11.190000 0.210000 11.400000 ( 10.065997) g(201): 16.940000 0.660000 17.600000 ( 13.150330) f(221): 12.310000 0.230000 12.540000 ( 11.014733) g(221): 18.410000 0.830000 19.240000 ( 14.411930) f(241): 13.650000 0.250000 13.900000 ( 12.084216) g(241): 20.820000 1.080000 21.900000 ( 16.397737) f(261): 15.020000 0.270000 15.290000 ( 13.133269) g(261): 23.820000 1.350000 25.170000 ( 19.101470) f(281): 16.660000 0.300000 16.960000 ( 14.486580) g(281): 25.540000 1.710000 27.250000 ( 20.518307) f(301): 17.600000 0.320000 17.920000 ( 15.103449) g(301): 27.690000 2.090000 29.780000 ( 22.433962) f(321): 19.280000 0.360000 19.640000 ( 16.325466) g(321): 29.380000 2.450000 31.830000 ( 24.070024) f(341): 20.880000 0.390000 21.270000 ( 17.473757) g(341): 33.470000 3.160000 36.630000 ( 28.516775)
I did a sample of benchmark running MacRuby and uploaded it here, not sure if you can make anything out of it. http://dl.dropbox.com/u/1341682/out.zip
I also did the two other DTrace functions collected_objects and function_count in this file: http://dl.dropbox.com/u/1341682/more_dtrace.zip
This is just over an arbitrary period of time running Benchmark. If any of this is useful and you want me to run it again over a specific time, with specific settings etc, let me know.
These are the three DTrace scripts that are included with MacRuby.
Thanks a lot for your input by the way, @dche!
According to your results Regexp#initialize is most definitely the culprit. So why didn't this help?
@patterns
cache defined at the top.lexer.analyse
.I confirmed that I'm on rev 59ccee962ed497a6648bfc97930d65d862aef46a, and that the @patterns are there.
I changed to benchmark only the lexer. Again widely different results:
[bibtex-ruby] ruby test/benchmark.rb 7:26:42 ☁ master ☂ ⚡ user system total real f(1): 0.000000 0.000000 0.000000 ( 0.000313) g(1): 0.000000 0.000000 0.000000 ( 0.000231) f(21): 0.000000 0.000000 0.000000 ( 0.003490) g(21): 0.010000 0.000000 0.010000 ( 0.006004) f(41): 0.010000 0.000000 0.010000 ( 0.006657) g(41): 0.000000 0.000000 0.000000 ( 0.008503) f(61): 0.020000 0.000000 0.020000 ( 0.015250) g(61): 0.020000 0.000000 0.020000 ( 0.016912) f(81): 0.010000 0.000000 0.010000 ( 0.015483) g(81): 0.020000 0.000000 0.020000 ( 0.017838) f(101): 0.020000 0.000000 0.020000 ( 0.020783) g(101): 0.020000 0.000000 0.020000 ( 0.020176)
[bibtex-ruby] macruby test/benchmark.rb 7:27:01 ☁ master ☂ ⚡ user system total real f(1): 0.140000 0.000000 0.140000 ( 0.142504) g(1): 0.040000 0.000000 0.040000 ( 0.038974) f(21): 0.910000 0.020000 0.930000 ( 0.840123) g(21): 0.940000 0.030000 0.970000 ( 0.859722) f(41): 1.680000 0.020000 1.700000 ( 1.615430) g(41): 2.160000 0.060000 2.220000 ( 1.764287) f(61): 2.570000 0.040000 2.610000 ( 2.467326) g(61): 3.560000 0.080000 3.640000 ( 2.895678) f(81): 3.630000 0.060000 3.690000 ( 3.485160) g(81): 4.610000 0.120000 4.730000 ( 3.720300) f(101): 4.530000 0.070000 4.600000 ( 4.375221) g(101): 6.060000 0.160000 6.220000 ( 4.877677)
Just to make sure this wasn't an artefact of the benchmarking script, I wrote a simple script that requires (the modified) bibtex, parses a file with about 60 publications, and prints the first one. MRI: 0.21, MacRuby: 24.16
If I read these results correctly (compared to the lexer-only ones you posted originally) it is now even slower than before. Can you confirm that?(Just MacRuby lexer – release / master). Weird!
True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)'
[test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.130000 0.000000 0.130000 ( 0.136986) g(1): 0.040000 0.000000 0.040000 ( 0.038374) f(21): 0.880000 0.010000 0.890000 ( 0.813287) g(21): 0.950000 0.030000 0.980000 ( 0.849334) f(41): 1.640000 0.020000 1.660000 ( 1.570212) g(41): 2.110000 0.060000 2.170000 ( 1.719145) f(61): 2.450000 0.030000 2.480000 ( 2.348409) g(61): 3.300000 0.080000 3.380000 ( 2.665291) f(81): 3.270000 0.050000 3.320000 ( 3.121622) g(81): 4.520000 0.120000 4.640000 ( 3.619892) f(101): 4.100000 0.060000 4.160000 ( 3.900480) g(101): 5.730000 0.150000 5.880000 ( 4.561672) f(121):^C [test] macruby benchmark.rb 8:14:55 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.140000 0.000000 0.140000 ( 0.140768) g(1): 0.040000 0.000000 0.040000 ( 0.038383) f(21): 0.890000 0.010000 0.900000 ( 0.813669) g(21): 0.940000 0.030000 0.970000 ( 0.847099) f(41): 1.650000 0.020000 1.670000 ( 1.586052) g(41): 2.110000 0.050000 2.160000 ( 1.724443) f(61): 2.460000 0.040000 2.500000 ( 2.347957) g(61): 3.220000 0.080000 3.300000 ( 2.619925) f(81): 3.280000 0.050000 3.330000 ( 3.114385) g(81): 4.480000 0.110000 4.590000 ( 3.596411) f(101): 4.080000 0.060000 4.140000 ( 3.895985) g(101): 5.760000 0.150000 5.910000 ( 4.611965)
Can you try this again but having uninstalled the gem first? Alternatively, you can also print out (before the benchmark starts) the value of:
BibTeX::Lexer.patterns
These should exist on master. If they don't we're somehow pulling in an old version.
On Mar 5, 2012, at 2:16 PM, Stian Håklev wrote:
True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)'
[test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.130000 0.000000 0.130000 ( 0.136986) g(1): 0.040000 0.000000 0.040000 ( 0.038374) f(21): 0.880000 0.010000 0.890000 ( 0.813287) g(21): 0.950000 0.030000 0.980000 ( 0.849334) f(41): 1.640000 0.020000 1.660000 ( 1.570212) g(41): 2.110000 0.060000 2.170000 ( 1.719145) f(61): 2.450000 0.030000 2.480000 ( 2.348409) g(61): 3.300000 0.080000 3.380000 ( 2.665291) f(81): 3.270000 0.050000 3.320000 ( 3.121622) g(81): 4.520000 0.120000 4.640000 ( 3.619892) f(101): 4.100000 0.060000 4.160000 ( 3.900480) g(101): 5.730000 0.150000 5.880000 ( 4.561672) f(121):^C [test] macruby benchmark.rb 8:14:55 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.140000 0.000000 0.140000 ( 0.140768) g(1): 0.040000 0.000000 0.040000 ( 0.038383) f(21): 0.890000 0.010000 0.900000 ( 0.813669) g(21): 0.940000 0.030000 0.970000 ( 0.847099) f(41): 1.650000 0.020000 1.670000 ( 1.586052) g(41): 2.110000 0.050000 2.160000 ( 1.724443) f(61): 2.460000 0.040000 2.500000 ( 2.347957) g(61): 3.220000 0.080000 3.300000 ( 2.619925) f(81): 3.280000 0.050000 3.330000 ( 3.114385) g(81): 4.480000 0.110000 4.590000 ( 3.596411) f(101): 4.080000 0.060000 4.140000 ( 3.895985) g(101): 5.760000 0.150000 5.910000 ( 4.611965)
Reply to this email directly or view it on GitHub: https://github.com/inukshuk/bibtex-ruby/issues/37#issuecomment-4322426
Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get
release:
user system total real
f(1): 0.140000 0.000000 0.140000 ( 0.136864)
g(1): 0.050000 0.000000 0.050000 ( 0.040198)
f(21): 0.880000 0.020000 0.900000 ( 0.828929)
g(21): 0.950000 0.020000 0.970000 ( 0.849521)
f(41): 1.690000 0.030000 1.720000 ( 1.619234)
g(41): 2.110000 0.050000 2.160000 ( 1.740376)
f(61): 2.470000 0.040000 2.510000 ( 2.362543)
g(61): 3.270000 0.080000 3.350000 ( 2.652986)
master
user system total real
f(1): 0.150000 0.000000 0.150000 ( 0.143398)
g(1): 0.070000 0.000000 0.070000 ( 0.052197)
f(21): 0.910000 0.020000 0.930000 ( 0.853703)
g(21): 0.940000 0.020000 0.960000 ( 0.853563)
f(41): 1.690000 0.030000 1.720000 ( 1.622108)
g(41): 2.130000 0.050000 2.180000 ( 1.741724)
f(61): 2.440000 0.040000 2.480000 ( 2.343833)
g(61): 3.290000 0.080000 3.370000 ( 2.669190)
On Mar 5, 2012, at 3:40 PM, Stian Håklev wrote:
Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get
How did you get around this? I'm sorry, I forgot to mention this earlier, but on master, you will have to generate the parsers (that's why parser and name_parser were missing). You can generate the parsers like this:
$ bundle install
$ bundle exec rake racc
After this, there should be the parser.rb and name_parser.rb files in the lib directory.
This is really useful! Well I'm learning more and more about BibTex-ruby, but it doesn't seem to be getting us anywhere... Here's the latest master, with gem uninstalled, and parser generated. This is using parser:
user system total real
f(1): 0.510000 0.010000 0.520000 ( 0.492472)
g(1): 0.060000 0.000000 0.060000 ( 0.055886)
f(21): 1.370000 0.030000 1.400000 ( 1.282683)
g(21): 1.370000 0.030000 1.400000 ( 1.243531)
f(41): 2.530000 0.050000 2.580000 ( 2.428568)
g(41): 3.150000 0.080000 3.230000 ( 2.638907)
f(61): 3.550000 0.060000 3.610000 ( 3.359150)
g(61): 4.740000 0.120000 4.860000 ( 3.822768)
and this using lexer
user system total real
f(1): 0.510000 0.000000 0.510000 ( 0.521181)
g(1): 0.050000 0.000000 0.050000 ( 0.056028)
f(21): 2.390000 0.040000 2.430000 ( 2.246512)
g(21): 1.420000 0.040000 1.460000 ( 1.261098)
f(41): 4.510000 0.080000 4.590000 ( 4.288429)
g(41): 3.290000 0.080000 3.370000 ( 2.695208)
f(61): 7.040000 0.130000 7.170000 ( 6.704729)
g(61): 4.650000 0.110000 4.760000 ( 3.736637)
I just installed a macruby-nightly to finally get to the bottom of this, but get an error in the racc gem about a missing symbol (_rb_catch); have you run into this before?
No, but I guess I ran racc using MRI Ruby. You only run it once to generate right, you don't require it when executing bibtex?
Thanks for all the time you spend on this!
Stian
It's true that we don't necessarily have to generate the parser with MacRuby (in fact, the gem ships with the generated parser). Problem is, I get the error when executing the parser.
What MacRuby version are you using precisely?
0.12 - I compiled it from nightly about a week ago.
I finally had time for more detailed profiling and I believe that the problem lies with MacRuby's StringScanner implementation: they create a new regular expression object every time #scan is called. That's why our explicit pattern cache does not make big difference at all.
Closing since we don't support macruby anymore
There seems to be a huge difference in speed between MacRuby (v. 0.12) and Ruby (1.9.2). In this case, it is simply parsing a file with 67 publications (a small subset of my large publication file, used for testing). Ruby parses the file in 0.16 seconds, and MacRuby uses almost 25 seconds (100 times slower).
I'd be happy to help you debug this if you let me know what to do. I am thinking about switching to MacRuby in my project, so I am quite interested in improving this.
script: