inukshuk / bibtex-ruby

A BibTeX library, parser, and converter for Ruby.
http://inukshuk.github.com/bibtex-ruby
GNU General Public License v3.0
156 stars 35 forks source link

MacRuby performance when parsing file #37

Closed houshuang closed 5 years ago

houshuang commented 12 years ago

There seems to be a huge difference in speed between MacRuby (v. 0.12) and Ruby (1.9.2). In this case, it is simply parsing a file with 67 publications (a small subset of my large publication file, used for testing). Ruby parses the file in 0.16 seconds, and MacRuby uses almost 25 seconds (100 times slower).

I'd be happy to help you debug this if you let me know what to do. I am thinking about switching to MacRuby in my project, so I am quite interested in improving this.

[macruby] time macruby bibtextest.rb                                                                                   
#<BibTeX::Bibliography data=[67]> 
macruby bibtextest.rb  24.45s user 0.66s system 119% cpu 20.964 total

[macruby] time ruby bibtextest.rb                                                                                      
#<BibTeX::Bibliography data=[67]>
ruby bibtextest.rb  0.16s user 0.03s system 98% cpu 0.198 total

[macruby] macruby --version                                                                                           
MacRuby 0.12 (ruby 1.9.2) [universal-darwin10.0, x86_64]

[macruby] wc ../folders2web/bibliography-short.bib                                                                      
 725    3358   89338 ../folders2web/bibliography-short.bib

script:

require 'rubygems'
require 'bibtex'

b=BibTeX.open("../folders2web/bibliography-short.bib") 
p b
inukshuk commented 12 years ago

There have been some issues with MacRuby before, so I find it encouraging that the parsing seems to work, albeit at this slow speed.

I've been using two scripts for benchmarking and profiling that I'll add to the repository in a minute.

Meanwhile, can you check whether or not the following gems are compatible with you version of MacRuby:

Also, do you have the dot executable installed? (check via which dot) If you're using macports or homebrew you should be able to get it by installing the graphviz package.

inukshuk commented 12 years ago

Alright, the benchmark and profiling are already in the repository.

Unfortunately, ruby-prof does not seem to be available for MacRuby yet: this will make it more difficult to find out what the problem is, but we'll try to make do with the benchmark script.

FIrst, see whether or not you can get benchmark.rb to run on your system. You may have to install gnuplot (homebrew or macports and the ruby gem), or, alternatively, comment out everything after the line require 'gnuplot'. Basically, what the script does, is to execute and benchmark two functions f and g. The idea is that f works on an individual BibTeX entry repeatedly, while g works on a larger bibliography. We would expect the performance of f and g to be extremely similar (because it should not make a difference whether you parse a thousand entries consecutively or a bibliography containing a thousand entries). Furthermore, both f and g should be linear: to see that it's convenient to look at the gnuplot graphs.

Anyway, there are two options n and k currently set to 1001 and 20. This means that the script will execute with 20 entries, then 40, then 60 and so on up to 1001. This typically gives you a nice plot. Since your test with 67 entries already took half a minute, you will probably want to use a smaller number for n first (for example, start with 100). It will still take a few minutes, but do try to run it and let me know the results.

inukshuk commented 12 years ago

Here is the output when I run the benchmark with MRI 1.9.2 on my system:

$ ruby test/benchmark.rb 
                 user     system      total        real
      f(1):  0.000000   0.000000   0.000000 (  0.001007)
      g(1):  0.000000   0.000000   0.000000 (  0.000678)
     f(21):  0.020000   0.000000   0.020000 (  0.020126)
     g(21):  0.020000   0.010000   0.030000 (  0.017425)
     f(41):  0.040000   0.000000   0.040000 (  0.036070)
     g(41):  0.030000   0.000000   0.030000 (  0.034123)
     f(61):  0.050000   0.000000   0.050000 (  0.055657)
     g(61):  0.050000   0.000000   0.050000 (  0.047118)
     f(81):  0.060000   0.000000   0.060000 (  0.057583)
     g(81):  0.050000   0.000000   0.050000 (  0.055533)
    f(101):  0.070000   0.000000   0.070000 (  0.068648)
    g(101):  0.070000   0.000000   0.070000 (  0.074981)
    f(121):  0.080000   0.000000   0.080000 (  0.081537)
    g(121):  0.090000   0.000000   0.090000 (  0.087914)
    f(141):  0.100000   0.000000   0.100000 (  0.105223)
    g(141):  0.100000   0.000000   0.100000 (  0.100114)
    f(161):  0.110000   0.000000   0.110000 (  0.111240)
    g(161):  0.110000   0.000000   0.110000 (  0.111883)
    f(181):  0.140000   0.010000   0.150000 (  0.134396)
    g(181):  0.130000   0.000000   0.130000 (  0.138483)
    f(201):  0.140000   0.000000   0.140000 (  0.137307)
    g(201):  0.160000   0.000000   0.160000 (  0.158308)
    f(221):  0.150000   0.000000   0.150000 (  0.156125)
    g(221):  0.180000   0.000000   0.180000 (  0.175792)
    f(241):  0.170000   0.000000   0.170000 (  0.172716)
    g(241):  0.190000   0.000000   0.190000 (  0.200111)
    f(261):  0.170000   0.000000   0.170000 (  0.180065)
    g(261):  0.220000   0.010000   0.230000 (  0.222491)
    f(281):  0.190000   0.000000   0.190000 (  0.191242)
    g(281):  0.250000   0.000000   0.250000 (  0.244426)
    f(301):  0.200000   0.000000   0.200000 (  0.207241)
    g(301):  0.270000   0.000000   0.270000 (  0.270698)
    f(321):  0.220000   0.000000   0.220000 (  0.223200)
    g(321):  0.290000   0.010000   0.300000 (  0.290713)
    f(341):  0.230000   0.000000   0.230000 (  0.238174)
    g(341):  0.320000   0.000000   0.320000 (  0.324253)
    f(361):  0.250000   0.000000   0.250000 (  0.249256)
    g(361):  0.320000   0.010000   0.330000 (  0.326222)
    f(381):  0.250000   0.000000   0.250000 (  0.275628)
    g(381):  0.340000   0.000000   0.340000 (  0.366691)
    f(401):  0.270000   0.000000   0.270000 (  0.280956)
    g(401):  0.350000   0.010000   0.360000 (  0.358961)
    f(421):  0.290000   0.000000   0.290000 (  0.294395)
    g(421):  0.370000   0.000000   0.370000 (  0.382586)
    f(441):  0.310000   0.000000   0.310000 (  0.310237)
    g(441):  0.410000   0.010000   0.420000 (  0.468336)
    f(461):  0.320000   0.000000   0.320000 (  0.352464)
    g(461):  0.440000   0.000000   0.440000 (  0.459599)
    f(481):  0.320000   0.000000   0.320000 (  0.324716)
    g(481):  0.470000   0.010000   0.480000 (  0.480357)
    f(501):  0.340000   0.000000   0.340000 (  0.366143)
    g(501):  0.500000   0.010000   0.510000 (  0.521129)
    f(521):  0.360000   0.000000   0.360000 (  0.382934)
    g(521):  0.540000   0.010000   0.550000 (  0.585427)
    f(541):  0.380000   0.000000   0.380000 (  0.409161)
    g(541):  0.560000   0.010000   0.570000 (  0.569871)
    f(561):  0.380000   0.000000   0.380000 (  0.392196)
    g(561):  0.610000   0.000000   0.610000 (  0.647884)
    f(581):  0.390000   0.010000   0.400000 (  0.401241)
    g(581):  0.560000   0.000000   0.560000 (  0.566974)
    f(601):  0.420000   0.010000   0.430000 (  0.455462)
    g(601):  0.560000   0.010000   0.570000 (  0.574165)
    f(621):  0.410000   0.000000   0.410000 (  0.419186)
    g(621):  0.590000   0.000000   0.590000 (  0.608821)
    f(641):  0.420000   0.010000   0.430000 (  0.426153)
    g(641):  0.630000   0.000000   0.630000 (  0.635338)
    f(661):  0.440000   0.010000   0.450000 (  0.449898)
    g(661):  0.650000   0.000000   0.650000 (  0.654837)
    f(681):  0.440000   0.000000   0.440000 (  0.446963)
    g(681):  0.680000   0.010000   0.690000 (  0.688274)
    f(701):  0.470000   0.000000   0.470000 (  0.474683)
    g(701):  0.710000   0.010000   0.720000 (  0.719676)
    f(721):  0.480000   0.000000   0.480000 (  0.484895)
    g(721):  0.740000   0.000000   0.740000 (  0.748554)
    f(741):  0.490000   0.010000   0.500000 (  0.491958)
    g(741):  0.790000   0.000000   0.790000 (  0.790580)
    f(761):  0.480000   0.010000   0.490000 (  0.488490)
    g(761):  0.820000   0.000000   0.820000 (  0.826844)
    f(781):  0.510000   0.010000   0.520000 (  0.513018)
    g(781):  0.850000   0.000000   0.850000 (  0.859994)
    f(801):  0.520000   0.010000   0.530000 (  0.534045)
    g(801):  0.890000   0.000000   0.890000 (  0.899764)
    f(821):  0.550000   0.010000   0.560000 (  0.575607)
    g(821):  0.990000   0.010000   1.000000 (  1.046289)
    f(841):  0.560000   0.010000   0.570000 (  0.588986)
    g(841):  0.970000   0.010000   0.980000 (  0.988280)
    f(861):  0.580000   0.000000   0.580000 (  0.585552)
    g(861):  1.010000   0.010000   1.020000 (  1.051506)
    f(881):  0.580000   0.000000   0.580000 (  0.590940)
    g(881):  1.070000   0.010000   1.080000 (  1.084322)
    f(901):  0.610000   0.010000   0.620000 (  0.644608)
    g(901):  1.100000   0.010000   1.110000 (  1.121152)
    f(921):  0.600000   0.000000   0.600000 (  0.604363)
    g(921):  1.140000   0.010000   1.150000 (  1.153237)
    f(941):  0.610000   0.010000   0.620000 (  0.618322)
    g(941):  1.200000   0.000000   1.200000 (  1.203676)
    f(961):  0.640000   0.010000   0.650000 (  0.656267)
    g(961):  1.250000   0.010000   1.260000 (  1.277296)
    f(981):  0.650000   0.000000   0.650000 (  0.658052)
    g(981):  1.100000   0.020000   1.120000 (  1.127124)
   f(1001):  0.660000   0.000000   0.660000 (  0.675562)
   g(1001):  1.240000   0.010000   1.250000 (  1.250755)
    sum(f): 17.120000   0.140000  17.260000 ( 17.599892)
    sum(g): 26.980000   0.230000  27.210000 ( 27.599565)

And here you can see the plotted output. As you can see both functions are (roughly) linear (that's good); but the function g executes considerably slower as the input size increases. This means that for large numbers of N, it takes longer to parse a bibliography containing N entries than to parse N individual entries. This is because the Bibliography class analyses stuff like resolving cross-references etc. Anyway, looking at the graph shows us that this should be optimized going forward. ;-)

houshuang commented 12 years ago

I just tried doing this. First, I had to modify lib/bibtex.rb to add require 'rubygems', otherwise it chokes on require 'multi_json' (I guess macruby doesn't have automatic rubygems support.

Afterwards, it runs fine. However, the "problem" is that the output for MRI (1.9.2p180) and MacRuby (0.12, Ruby 1.9.2) are extremely similar. Here is the output for MRI:

                 user     system      total        real
      f(1):  0.000000   0.000000   0.000000 (  0.000622)
      g(1):  0.000000   0.000000   0.000000 (  0.000403)
     f(21):  0.010000   0.000000   0.010000 (  0.011540)
     g(21):  0.010000   0.000000   0.010000 (  0.007310)
     f(41):  0.020000   0.000000   0.020000 (  0.018733)
     g(41):  0.020000   0.000000   0.020000 (  0.017633)
     f(61):  0.030000   0.000000   0.030000 (  0.029408)
     g(61):  0.020000   0.000000   0.020000 (  0.024942)
     f(81):  0.040000   0.000000   0.040000 (  0.037122)
     g(81):  0.040000   0.000000   0.040000 (  0.039202)
    f(101):  0.040000   0.000000   0.040000 (  0.046273)
    g(101):  0.050000   0.000000   0.050000 (  0.047190)
    f(121):  0.060000   0.000000   0.060000 (  0.056533)
    g(121):  0.060000   0.000000   0.060000 (  0.060450)
    f(141):  0.060000   0.000000   0.060000 (  0.064366)
    g(141):  0.070000   0.000000   0.070000 (  0.073773)
    f(161):  0.080000   0.000000   0.080000 (  0.078557)
    g(161):  0.090000   0.000000   0.090000 (  0.088499)
    f(181):  0.080000   0.000000   0.080000 (  0.081382)
    g(181):  0.090000   0.000000   0.090000 (  0.089241)
    f(201):  0.090000   0.000000   0.090000 (  0.087167)
    g(201):  0.090000   0.000000   0.090000 (  0.097179)
    f(221):  0.100000   0.000000   0.100000 (  0.092566)
    g(221):  0.110000   0.000000   0.110000 (  0.111960)
    f(241):  0.100000   0.010000   0.110000 (  0.105829)
    g(241):  0.120000   0.000000   0.120000 (  0.121754)
    f(261):  0.120000   0.000000   0.120000 (  0.113211)
    g(261):  0.130000   0.000000   0.130000 (  0.135470)
    f(281):  0.120000   0.000000   0.120000 (  0.118759)
    g(281):  0.150000   0.000000   0.150000 (  0.146476)
    f(301):  0.130000   0.000000   0.130000 (  0.134218)
    g(301):  0.170000   0.000000   0.170000 (  0.168745)
    f(321):  0.140000   0.000000   0.140000 (  0.140407)
    g(321):  0.180000   0.000000   0.180000 (  0.183246)
    f(341):  0.150000   0.000000   0.150000 (  0.147753)
    g(341):  0.190000   0.000000   0.190000 (  0.189896)
    f(361):  0.150000   0.000000   0.150000 (  0.153986)
    g(361):  0.210000   0.000000   0.210000 (  0.204915)
    f(381):  0.160000   0.000000   0.160000 (  0.166869)
    g(381):  0.200000   0.010000   0.210000 (  0.202102)
    f(401):  0.180000   0.000000   0.180000 (  0.182246)
    g(401):  0.230000   0.000000   0.230000 (  0.229675)
    f(421):  0.190000   0.000000   0.190000 (  0.186329)
    g(421):  0.230000   0.000000   0.230000 (  0.234171)
    f(441):  0.200000   0.000000   0.200000 (  0.196082)
    g(441):  0.260000   0.000000   0.260000 (  0.266406)
    f(461):  0.200000   0.000000   0.200000 (  0.203138)
    g(461):  0.260000   0.000000   0.260000 (  0.258192)
    f(481):  0.210000   0.010000   0.220000 (  0.208168)
    g(481):  0.270000   0.000000   0.270000 (  0.271574)
    f(501):  0.210000   0.000000   0.210000 (  0.212898)
    g(501):  0.290000   0.000000   0.290000 (  0.294501)
    f(521):  0.230000   0.000000   0.230000 (  0.224249)
    g(521):  0.330000   0.000000   0.330000 (  0.332794)
    f(541):  0.230000   0.010000   0.240000 (  0.236670)
    g(541):  0.310000   0.000000   0.310000 (  0.307345)
    f(561):  0.240000   0.000000   0.240000 (  0.245936)
    g(561):  0.330000   0.000000   0.330000 (  0.329643)
    f(581):  0.260000   0.000000   0.260000 (  0.254318)
    g(581):  0.330000   0.000000   0.330000 (  0.339464)
    f(601):  0.260000   0.000000   0.260000 (  0.261145)
    g(601):  0.380000   0.010000   0.390000 (  0.376544)
    f(621):  0.260000   0.000000   0.260000 (  0.263120)
    g(621):  0.390000   0.000000   0.390000 (  0.390627)
    f(641):  0.270000   0.000000   0.270000 (  0.271359)
    g(641):  0.410000   0.000000   0.410000 (  0.407785)
    f(661):  0.280000   0.000000   0.280000 (  0.283797)
    g(661):  0.410000   0.010000   0.420000 (  0.410276)
    f(681):  0.290000   0.000000   0.290000 (  0.299164)
    g(681):  0.430000   0.000000   0.430000 (  0.422964)
    f(701):  0.290000   0.000000   0.290000 (  0.299465)
    g(701):  0.440000   0.000000   0.440000 (  0.440954)
    f(721):  0.310000   0.010000   0.320000 (  0.312829)
    g(721):  0.480000   0.000000   0.480000 (  0.474714)
    f(741):  0.320000   0.000000   0.320000 (  0.327411)
    g(741):  0.500000   0.000000   0.500000 (  0.494715)
    f(761):  0.320000   0.000000   0.320000 (  0.323732)
    g(761):  0.500000   0.010000   0.510000 (  0.505016)
    f(781):  0.340000   0.000000   0.340000 (  0.343639)
    g(781):  0.520000   0.000000   0.520000 (  0.522267)
    f(801):  0.350000   0.000000   0.350000 (  0.350007)
    g(801):  0.550000   0.010000   0.560000 (  0.552032)
    f(821):  0.350000   0.000000   0.350000 (  0.352463)
    g(821):  0.580000   0.000000   0.580000 (  0.579625)
    f(841):  0.370000   0.000000   0.370000 (  0.374373)
    g(841):  0.600000   0.000000   0.600000 (  0.600615)
    f(861):  0.360000   0.010000   0.370000 (  0.364205)
    g(861):  0.620000   0.000000   0.620000 (  0.617375)
    f(881):  0.380000   0.000000   0.380000 (  0.380935)
    g(881):  0.620000   0.000000   0.620000 (  0.632298)
    f(901):  0.400000   0.010000   0.410000 (  0.394550)
    g(901):  0.660000   0.000000   0.660000 (  0.661010)
    f(921):  0.400000   0.000000   0.400000 (  0.406453)
    g(921):  0.690000   0.000000   0.690000 (  0.688062)
    f(941):  0.400000   0.010000   0.410000 (  0.405330)
    g(941):  0.690000   0.000000   0.690000 (  0.691564)
    f(961):  0.430000   0.000000   0.430000 (  0.429863)
    g(961):  0.740000   0.010000   0.750000 (  0.750301)
    f(981):  0.420000   0.000000   0.420000 (  0.418148)
    g(981):  0.760000   0.000000   0.760000 (  0.770694)
   f(1001):  0.440000   0.000000   0.440000 (  0.436495)
   g(1001):  0.800000   0.010000   0.810000 (  0.805949)
    sum(f): 11.070000   0.070000  11.140000 ( 11.133818)
    sum(g): 16.610000   0.070000  16.680000 ( 16.669537)

and here is the output for MacRuby

                 user     system      total        real
      f(1):  0.000000   0.000000   0.000000 (  0.000597)
      g(1):  0.000000   0.000000   0.000000 (  0.000403)
     f(21):  0.010000   0.000000   0.010000 (  0.008394)
     g(21):  0.010000   0.000000   0.010000 (  0.010537)
     f(41):  0.020000   0.000000   0.020000 (  0.018743)
     g(41):  0.020000   0.010000   0.030000 (  0.017602)
     f(61):  0.030000   0.000000   0.030000 (  0.026597)
     g(61):  0.020000   0.000000   0.020000 (  0.028474)
     f(81):  0.040000   0.000000   0.040000 (  0.037569)
     g(81):  0.040000   0.000000   0.040000 (  0.034700)
    f(101):  0.040000   0.000000   0.040000 (  0.047319)
    g(101):  0.050000   0.000000   0.050000 (  0.046107)
    f(121):  0.060000   0.000000   0.060000 (  0.055845)
    g(121):  0.060000   0.000000   0.060000 (  0.062644)
    f(141):  0.070000   0.000000   0.070000 (  0.070522)
    g(141):  0.070000   0.000000   0.070000 (  0.067645)
    f(161):  0.070000   0.000000   0.070000 (  0.079994)
    g(161):  0.090000   0.000000   0.090000 (  0.081004)
    f(181):  0.080000   0.000000   0.080000 (  0.082702)
    g(181):  0.080000   0.000000   0.080000 (  0.086279)
    f(201):  0.090000   0.000000   0.090000 (  0.087983)
    g(201):  0.100000   0.000000   0.100000 (  0.102436)
    f(221):  0.100000   0.000000   0.100000 (  0.097282)
    g(221):  0.110000   0.000000   0.110000 (  0.108936)
    f(241):  0.110000   0.000000   0.110000 (  0.107562)
    g(241):  0.120000   0.000000   0.120000 (  0.123231)
    f(261):  0.120000   0.000000   0.120000 (  0.117738)
    g(261):  0.130000   0.000000   0.130000 (  0.138497)
    f(281):  0.130000   0.000000   0.130000 (  0.121723)
    g(281):  0.160000   0.000000   0.160000 (  0.163618)
    f(301):  0.140000   0.010000   0.150000 (  0.146252)
    g(301):  0.170000   0.000000   0.170000 (  0.167022)
    f(321):  0.150000   0.000000   0.150000 (  0.149100)
    g(321):  0.160000   0.000000   0.160000 (  0.166145)
    f(341):  0.150000   0.000000   0.150000 (  0.147195)
    g(341):  0.190000   0.000000   0.190000 (  0.187156)
    f(361):  0.150000   0.000000   0.150000 (  0.155126)
    g(361):  0.190000   0.000000   0.190000 (  0.190667)
    f(381):  0.170000   0.000000   0.170000 (  0.166908)
    g(381):  0.210000   0.000000   0.210000 (  0.211386)
    f(401):  0.170000   0.000000   0.170000 (  0.176037)
    g(401):  0.220000   0.010000   0.230000 (  0.218084)
    f(421):  0.180000   0.000000   0.180000 (  0.184280)
    g(421):  0.230000   0.000000   0.230000 (  0.227744)
    f(441):  0.190000   0.000000   0.190000 (  0.193176)
    g(441):  0.260000   0.000000   0.260000 (  0.258051)
    f(461):  0.210000   0.000000   0.210000 (  0.206932)
    g(461):  0.250000   0.000000   0.250000 (  0.254047)
    f(481):  0.210000   0.000000   0.210000 (  0.207435)
    g(481):  0.290000   0.000000   0.290000 (  0.291690)
    f(501):  0.210000   0.010000   0.220000 (  0.218241)
    g(501):  0.290000   0.000000   0.290000 (  0.284853)
    f(521):  0.220000   0.000000   0.220000 (  0.225706)
    g(521):  0.320000   0.000000   0.320000 (  0.321912)
    f(541):  0.230000   0.000000   0.230000 (  0.232489)
    g(541):  0.310000   0.000000   0.310000 (  0.308900)
    f(561):  0.250000   0.000000   0.250000 (  0.247742)
    g(561):  0.350000   0.010000   0.360000 (  0.347809)
    f(581):  0.250000   0.000000   0.250000 (  0.251443)
    g(581):  0.420000   0.000000   0.420000 (  0.430320)
    f(601):  0.290000   0.000000   0.290000 (  0.287576)
    g(601):  0.450000   0.000000   0.450000 (  0.450228)
    f(621):  0.270000   0.000000   0.270000 (  0.274842)
    g(621):  0.390000   0.010000   0.400000 (  0.394948)
    f(641):  0.280000   0.000000   0.280000 (  0.279406)
    g(641):  0.420000   0.000000   0.420000 (  0.417165)
    f(661):  0.290000   0.000000   0.290000 (  0.300725)
    g(661):  0.450000   0.000000   0.450000 (  0.451651)
    f(681):  0.300000   0.010000   0.310000 (  0.297297)
    g(681):  0.470000   0.000000   0.470000 (  0.476887)
    f(701):  0.340000   0.000000   0.340000 (  0.337852)
    g(701):  0.480000   0.000000   0.480000 (  0.482324)
    f(721):  0.310000   0.010000   0.320000 (  0.314285)
    g(721):  0.540000   0.000000   0.540000 (  0.546801)
    f(741):  0.340000   0.000000   0.340000 (  0.337292)
    g(741):  0.490000   0.010000   0.500000 (  0.497936)
    f(761):  0.380000   0.000000   0.380000 (  0.381007)
    g(761):  0.610000   0.000000   0.610000 (  0.611881)
    f(781):  0.350000   0.010000   0.360000 (  0.348679)
    g(781):  0.580000   0.000000   0.580000 (  0.585341)
    f(801):  0.340000   0.000000   0.340000 (  0.345329)
    g(801):  0.610000   0.000000   0.610000 (  0.620062)
    f(821):  0.360000   0.010000   0.370000 (  0.355276)
    g(821):  0.540000   0.000000   0.540000 (  0.545332)
    f(841):  0.360000   0.000000   0.360000 (  0.366477)
    g(841):  0.630000   0.000000   0.630000 (  0.631926)
    f(861):  0.440000   0.010000   0.450000 (  0.448708)
    g(861):  0.650000   0.000000   0.650000 (  0.648266)
    f(881):  0.380000   0.000000   0.380000 (  0.377770)
    g(881):  0.580000   0.010000   0.590000 (  0.591761)
    f(901):  0.390000   0.000000   0.390000 (  0.390795)
    g(901):  0.650000   0.000000   0.650000 (  0.653867)
    f(921):  0.390000   0.010000   0.400000 (  0.392912)
    g(921):  0.710000   0.000000   0.710000 (  0.709803)
    f(941):  0.390000   0.000000   0.390000 (  0.396474)
    g(941):  0.710000   0.000000   0.710000 (  0.711333)
    f(961):  0.410000   0.010000   0.420000 (  0.411303)
    g(961):  0.750000   0.000000   0.750000 (  0.756501)
    f(981):  0.420000   0.000000   0.420000 (  0.424169)
    g(981):  0.800000   0.010000   0.810000 (  0.793700)
   f(1001):  0.410000   0.000000   0.410000 (  0.419748)
   g(1001):  0.760000   0.010000   0.770000 (  0.762972)
    sum(f): 11.290000   0.090000  11.380000 ( 11.356555)
    sum(g): 17.190000   0.080000  17.270000 ( 17.278584)

I reran my testscript, thinking that perhaps there had been an update to MacRuby, bibtex-ruby, or one of the other libraries used, which caused a speedup, but the time difference between parsing a bibliography of 65 publications is still more than 150x.

inukshuk commented 12 years ago

Stian, thanks for getting back to this!

It is generally frowned upon in the Ruby community to force require rubygems in a library. If you don't want to edit the file, you can just run your script with ruby -rrubygems or, alternatively, add the require to the benchmark script.

Regarding the results I can see that a) you have a much faster computer than I ;-) and b) you're, right that the results are extremely similar. This means that the parser is not significantly slower in MacRuby and therefore we're losing performance somewhere else.

Just to make sure, in your test script above you're printing the bibliography (the p b); could you change that to something simpler, for instance, just print b.length or something like that. This is just to make sure that it is opening the bibliography which is in fact slower and not printing it.

Since we cannot use ruby-prof we'll have to adapt the benchmark script to our needs.

houshuang commented 12 years ago

The rubygems thing is no biggie. I saw somewhere that macruby removed it to remove overhead. Anyway I can fix that easily.

I removed the print line entirely, and still

[folders2web] time ruby speedtest.rb 4:16:02 ☁ master ☂ ⚡ ✭ ruby speedtest.rb 0.17s user 0.05s system 70% cpu 0.303 total [folders2web] time macruby speedtest.rb 4:16:38 ☁ master ☂ ⚡ ✭ macruby speedtest.rb 23.05s user 0.66s system 118% cpu 20.008 total

I have a 2.7 GhZ i7 MacBook Pro 13" :)

Stian

houshuang commented 12 years ago

More numbers:

Only parsing bibtex-file, not printing MRI: 0.17 MacRuby: 23.05

Only parsing bibtex-file with 3 entries MRI: 0.15 MacRuby: 3.45

Only requiring rubygems and bibtex-ruby, not doing anything else MRI: 0.13 MacRuby: 2.74

Only printing "Hi", with no libraries required MRI: 0.06 MacRuby: 0.01

I did the last to check that there wasn't a longer startup time required for MacRuby, which it wasn't. It does take significantly longer to load bibtex-ruby, but even then, the time it spends on even just 3 entries is significantly longer.

inukshuk commented 12 years ago

Please try the following:

In benchmark.rb replace the entry with the contents of your bibtex file (the 3 entries). That is:

input = <<-END
  < ... the 3 entries ... >
END

Next, change n, k to:

n, k = 51, 10

And rerun the test, make a note of the Results.

Then, change the two lines inside the f/g tests from BibTeX::Parser.new.parse(input) to BibTeX.parse(input), and try again.

houshuang commented 12 years ago

Did as you asked. Output before I changed the parse lines:

[test] macruby ./benchmark.rb 4:41:17 user system total real f(1): 0.550000 0.010000 0.560000 ( 0.521162) g(1): 0.200000 0.000000 0.200000 ( 0.174278) f(11): 2.130000 0.050000 2.180000 ( 1.855182) g(11): 6.640000 0.120000 6.760000 ( 5.462189) f(21): 4.390000 0.090000 4.480000 ( 3.594807) g(21): 21.710000 0.700000 22.410000 ( 17.545464) f(31): 6.860000 0.130000 6.990000 ( 5.537885) g(31): 45.450000 2.160000 47.610000 ( 36.967703) f(41): 9.000000 0.170000 9.170000 ( 7.142162) g(41): 77.760000 4.220000 81.980000 ( 62.952831) f(51): 11.680000 0.230000 11.910000 ( 9.023000) g(51):121.490000 6.940000 128.430000 ( 99.794021) sum(f): 34.610000 0.680000 35.290000 ( 27.674197) sum(g):273.250000 14.140000 287.390000 (222.896485)

And here it is with the two lines changed

                 user     system      total        real
      f(1):  0.590000   0.000000   0.590000 (  0.569724)
      g(1):  0.210000   0.010000   0.220000 (  0.176379)
     f(11):  2.330000   0.050000   2.380000 (  2.051605)
     g(11):  6.690000   0.130000   6.820000 (  5.505299)
     f(21):  4.580000   0.090000   4.670000 (  3.763134)
     g(21): 21.850000   0.700000  22.550000 ( 17.717141)
     f(31):  6.750000   0.130000   6.880000 (  5.419445)
     g(31): 45.630000   2.170000  47.800000 ( 37.105453)
     f(41):  9.050000   0.170000   9.220000 (  7.159262)
     g(41): 77.960000   4.240000  82.200000 ( 63.075123)
     f(51): 11.890000   0.240000  12.130000 (  9.149539)
     g(51):119.660000   6.780000 126.440000 ( 96.518764)
    sum(f): 35.190000   0.680000  35.870000 ( 28.112709)
    sum(g):272.000000  14.030000 286.030000 (220.098160)
inukshuk commented 12 years ago

Interesting. Could you post the entries you're using as input? Are there by any chance cross references?

inukshuk commented 12 years ago

Plus, as a reference, could you run the tests above with MRI too?

houshuang commented 12 years ago

input = <<-END %% This BibTeX bibliography file was created using BibDesk. %% http://bibdesk.sourceforge.net/

%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400

%% Saved with string encoding Unicode (UTF-8)

@inproceedings{vladoiu2011open, Author = {Vladoiu, M.}, Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th}, Date-Added = {2011-09-13 13:34:16 +0000}, Date-Modified = {2011-09-13 13:34:46 +0000}, Keywords = {cites-me; OER/open ed}, Organization = {IEEE}, Pages = {1--6}, Title = {Open courseware initiatives-after 10 years}, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAYgAAAAAAYgAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivBN2bGFkb2l1MjAxMW9wZW4ucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALu7lypTT2gAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKlQwaAAAAAQAMABLivAAAAF8AAABXAAIAM0hvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAdmxhZG9pdTIwMTFvcGVuLnBkZgAADgAoABMAdgBsAGEAZABvAGkAdQAyADAAMQAxAG8AcABlAG4ALgBwAGQAZgAPAAoABABIAG8AbQBlABIALC9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay92bGFkb2l1MjAxMW9wZW4ucGRmABMADS9Wb2x1bWVzL0hvbWUAABUAAgAT//8AAIAF0hwdHh9YJGNsYXNzZXNaJGNsYXNzbmFtZaMfICFdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3RfECsuLi8uLi9Eb2N1bWVudHMvQmliZGVzay92bGFkb2l1MjAxMW9wZW4ucGRm0hwdJCWiJSFcTlNEaWN0aW9uYXJ5EgABhqBfEA9OU0tleWVkQXJjaGl2ZXIACAARABYAHwAoADIANQA6ADwARQBLAFIAXQBlAGwAbwBxAHMAdgB4AHoAfACGAJMAmACgAiwCLgIzAjwCRwJLAlkCYAJpApcCnAKfAqwCsQAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAALD}}

@inproceedings{valjataga2011open, Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.}, Booktitle = {Proceedings of the 4th International Network-Based Education 2011 Conference The Social Media in the Middle of Nowhere}, Date-Added = {2011-08-09 10:54:02 +0000}, Date-Modified = {2011-08-09 11:20:02 +0000}, Editor = {Ruokamo, H. and Eriksson, M. and Pekkala, L. and Vuojärvi, H.}, Keywords = {open courses}, Organization = {Rovaniemi: University of Lapland}, Pages = {68-7}, Title = {Open Online Courses: Responding to Design Challenges}, Year = {2011}, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAZAAAAAAAZAAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivBV2YWxqYXRhZ2EyMDExb3Blbi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALLykymaMLgAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKZsRuAAAAAQAMABLivAAAAF8AAABXAAIANUhvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAdmFsamF0YWdhMjAxMW9wZW4ucGRmAAAOACwAFQB2AGEAbABqAGEAdABhAGcAYQAyADAAMQAxAG8AcABlAG4ALgBwAGQAZgAPAAoABABIAG8AbQBlABIALi9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay92YWxqYXRhZ2EyMDExb3Blbi5wZGYAEwANL1ZvbHVtZXMvSG9tZQAAFQACABP//wAAgAXSHB0eH1gkY2xhc3Nlc1okY2xhc3NuYW1lox8gIV1OU011dGFibGVEYXRhVk5TRGF0YVhOU09iamVjdF8QLS4uLy4uL0RvY3VtZW50cy9CaWJkZXNrL3ZhbGphdGFnYTIwMTFvcGVuLnBkZtIcHSQloiUhXE5TRGljdGlvbmFyeRIAAYagXxAPTlNLZXllZEFyY2hpdmVyAAgAEQAWAB8AKAAyADUAOgA8AEUASwBSAF0AZQBsAG8AcQBzAHYAeAB6AHwAhgCTAJgAoAI0AjYCOwJEAk8CUwJhAmgCcQKhAqYCqQK2ArsAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAACzQ==}}

@unpublished{resendes2011comprehensive, Author = {Resendes, Monica}, Date-Added = {2011-08-08 01:10:03 +0000}, Date-Modified = {2011-08-08 01:10:38 +0000}, Title = {Comprehensive Exam}, Year = {2011}, Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAbAAAAAAAbAAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivB1yZXNlbmRlczIwMTFjb21wcmVoZW5zaXZlLnBkZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALKEbymSvkAAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKZOfQAAAAAQAMABLivAAAAF8AAABXAAIAPUhvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAcmVzZW5kZXMyMDExY29tcHJlaGVuc2l2ZS5wZGYAAA4APAAdAHIAZQBzAGUAbgBkAGUAcwAyADAAMQAxAGMAbwBtAHAAcgBlAGgAZQBuAHMAaQB2AGUALgBwAGQAZgAPAAoABABIAG8AbQBlABIANi9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay9yZXNlbmRlczIwMTFjb21wcmVoZW5zaXZlLnBkZgATAA0vVm9sdW1lcy9Ib21lAAAVAAIAE///AACABdIcHR4fWCRjbGFzc2VzWiRjbGFzc25hbWWjHyAhXU5TTXV0YWJsZURhdGFWTlNEYXRhWE5TT2JqZWN0XxA1Li4vLi4vRG9jdW1lbnRzL0JpYmRlc2svcmVzZW5kZXMyMDExY29tcHJlaGVuc2l2ZS5wZGbSHB0kJaIlIVxOU0RpY3Rpb25hcnkSAAGGoF8QD05TS2V5ZWRBcmNoaXZlcgAIABEAFgAfACgAMgA1ADoAPABFAEsAUgBdAGUAbABvAHEAcwB2AHgAegB8AIYAkwCYAKACVAJWAlsCZAJvAnMCgQKIApECyQLOAtEC3gLjAAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAAvU=}}

END

[test] ruby -KU benchmark.rb

11:40:00 user system total real f(1): 0.000000 0.000000 0.000000 ( 0.002150) g(1): 0.010000 0.000000 0.010000 ( 0.001420) f(11): 0.010000 0.000000 0.010000 ( 0.016743) g(11): 0.020000 0.000000 0.020000 ( 0.017404) f(21): 0.030000 0.000000 0.030000 ( 0.030529) g(21): 0.040000 0.000000 0.040000 ( 0.036849) f(31): 0.040000 0.000000 0.040000 ( 0.046699) g(31): 0.050000 0.000000 0.050000 ( 0.044248) f(41): 0.060000 0.000000 0.060000 ( 0.063777) g(41): 0.070000 0.000000 0.070000 ( 0.064466) f(51): 0.070000 0.000000 0.070000 ( 0.077645) g(51): 0.080000 0.000000 0.080000 ( 0.078850) sum(f): 0.210000 0.000000 0.210000 ( 0.237543) sum(g): 0.270000 0.000000 0.270000 ( 0.243237)

[test] ruby -KU benchmark.rb

11:39:53 user system total real f(1): 0.000000 0.000000 0.000000 ( 0.002288) g(1): 0.000000 0.000000 0.000000 ( 0.001501) f(11): 0.020000 0.000000 0.020000 ( 0.018536) g(11): 0.020000 0.010000 0.030000 ( 0.021100) f(21): 0.030000 0.000000 0.030000 ( 0.033027) g(21): 0.040000 0.000000 0.040000 ( 0.035162) f(31): 0.050000 0.000000 0.050000 ( 0.050134) g(31): 0.050000 0.000000 0.050000 ( 0.050362) f(41): 0.070000 0.000000 0.070000 ( 0.070757) g(41): 0.070000 0.000000 0.070000 ( 0.074642) f(51): 0.090000 0.000000 0.090000 ( 0.087376) g(51): 0.090000 0.000000 0.090000 ( 0.096798) sum(f): 0.260000 0.000000 0.260000 ( 0.262118) sum(g): 0.270000 0.010000 0.280000 ( 0.279564)

inukshuk commented 12 years ago

Hmm, I think we're getting closer. The difference seems to be triggered by something in your entries. Could you try: stripping the comments and the file attachment fields (and if that does not make a difference) other fields until the execution time comes down?

I suspect the problem has to do with parsing the file attachment entries which are quite long. If you could confirm this, I'll come up with a benchmark file that will help us pinpoint what's causing the problem in parser or lexer.

houshuang commented 12 years ago

New input:

input = <<-END %% This BibTeX bibliography file was created using BibDesk. %% http://bibdesk.sourceforge.net/

%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400

%% Saved with string encoding Unicode (UTF-8)

@inproceedings{vladoiu2011open, Author = {Vladoiu, M.}, Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th}, Pages = {1--6}, Title = {Open courseware initiatives-after 10 years}}

@inproceedings{valjataga2011open, Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.}, Booktitle = {Proceedings of the 4th International Network-Based Education 2011 Conference The Social Media in the Middle of Nowhere}, Editor = {Ruokamo, H. and Eriksson, M. and Pekkala, L. and Vuojärvi, H.}, Organization = {Rovaniemi: University of Lapland}, Pages = {68-7}, Title = {Open Online Courses: Responding to Design Challenges}, Year = {2011}}

@unpublished{resendes2011comprehensive, Author = {Resendes, Monica}, Title = {Comprehensive Exam}, Year = {2011}}

END

new result

[test] macruby benchmark.rb 12:11:26 user system total real f(1): 0.420000 0.000000 0.420000 ( 0.423830) g(1): 0.130000 0.000000 0.130000 ( 0.110229) f(11): 1.260000 0.030000 1.290000 ( 1.171771) g(11): 1.840000 0.040000 1.880000 ( 1.622123) f(21): 2.190000 0.040000 2.230000 ( 2.051199) g(21): 4.630000 0.100000 4.730000 ( 3.795607) f(31): 3.100000 0.060000 3.160000 ( 2.861643) g(31): 9.500000 0.190000 9.690000 ( 7.664800) f(41): 5.180000 0.090000 5.270000 ( 4.807926) g(41): 15.220000 0.270000 15.490000 ( 12.194606) f(51): 6.360000 0.130000 6.490000 ( 5.832457) g(51): 20.350000 0.350000 20.700000 ( 15.984544) sum(f): 18.510000 0.350000 18.860000 ( 17.148826) sum(g): 51.670000 0.950000 52.620000 ( 41.371907)

(with Parser.new.parse)

houshuang commented 12 years ago

Another attempt

input: input = <<-END %% This BibTeX bibliography file was created using BibDesk. %% http://bibdesk.sourceforge.net/

%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400

%% Saved with string encoding Unicode (UTF-8)

@inproceedings{vladoiu2011open, Author = {Vladoiu, M.}, Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th}, Pages = {1--6}, Title = {Open courseware initiatives-after 10 years}}

@inproceedings{valjataga2011open, Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.}, Title = {Open Online Courses: Responding to Design Challenges}, Year = {2011}}

@unpublished{resendes2011comprehensive, Author = {Resendes, Monica}, Title = {Comprehensive Exam}, Year = {2011}}

END

result [test] macruby benchmark.rb 12:12:52 user system total real f(1): 0.390000 0.000000 0.390000 ( 0.396010) g(1): 0.070000 0.010000 0.080000 ( 0.067129) f(11): 0.800000 0.010000 0.810000 ( 0.732907) g(11): 1.060000 0.020000 1.080000 ( 0.963519) f(21): 1.550000 0.030000 1.580000 ( 1.474796) g(21): 3.010000 0.060000 3.070000 ( 2.498192) f(31): 2.220000 0.040000 2.260000 ( 2.080213) g(31): 5.210000 0.100000 5.310000 ( 4.185519) f(41): 3.110000 0.050000 3.160000 ( 2.877646) g(41): 8.230000 0.150000 8.380000 ( 6.489485) f(51): 3.640000 0.060000 3.700000 ( 3.352817) g(51): 11.750000 0.210000 11.960000 ( 9.281206) sum(f): 11.710000 0.190000 11.900000 ( 10.914390) sum(g): 29.330000 0.550000 29.880000 ( 23.485050)

inukshuk commented 12 years ago

Can you keep trying removing fields from your entries (also you may want to try and remove the comments at the beginning of the file) and rerunning the benchmark? The thing is, when you ran my original test with this entry there was hardly any difference in speed between MRI and MacRuby. Using your entries there is; so there has got to be a difference that is triggered by some detail in your entries and we need to try and locate that detail.

This is normally quite easy with ruby-prof because it gives you a summary of the time spent in individual methods, but for now our best bet is to consecutively remove data from your entries until there is a big jump in execution time.

My test entry was a one book with author, address, date-added, date-modified, keywords, publisher, series, title and year fields. Now the odd thing is that your entries don't seem to contain anything peculiar. Perhaps you start by looking at each entry individually; use my test entry as a reference: each of your entries, when executed individually, should not take considerably longer than mine. If any of them do take longer, we can begin by removing individual fields to see which fields are causing the performance penalty.

houshuang commented 12 years ago

I am sorry, I must have made a mistake with the original run of benchmark, which might have set us back a bit. Rerunning it with the original settings in benchmark (as supplied with the gem) shows a huge difference between MRI and MacRuby. Not sure how I managed to get the numbers previously.

First baseline load numbers for a file which only contains "require bibtex"

MRI: 0.11 MacRuby: 2.76

(and for fun, require 'citeproc')

MRI: 0.33 MacRuby: 6.52

Then the benchmarks (using the one Pickaxe publication)

MRI user system total real f(1): 0.000000 0.000000 0.000000 ( 0.000743) g(1): 0.000000 0.000000 0.000000 ( 0.000504) f(11): 0.010000 0.000000 0.010000 ( 0.008548) g(11): 0.010000 0.000000 0.010000 ( 0.008101) f(21): 0.010000 0.000000 0.010000 ( 0.007892) g(21): 0.010000 0.000000 0.010000 ( 0.013536) f(31): 0.020000 0.000000 0.020000 ( 0.017215) g(31): 0.010000 0.000000 0.010000 ( 0.010945) f(41): 0.020000 0.000000 0.020000 ( 0.021488) g(41): 0.020000 0.000000 0.020000 ( 0.017821) f(51): 0.030000 0.000000 0.030000 ( 0.029464) g(51): 0.020000 0.000000 0.020000 ( 0.022280) sum(f): 0.090000 0.000000 0.090000 ( 0.085350) sum(g): 0.070000 0.000000 0.070000 ( 0.073187)

MacRuby: user system total real f(1): 0.390000 0.000000 0.390000 ( 0.369663) g(1): 0.050000 0.000000 0.050000 ( 0.046697) f(11): 0.560000 0.010000 0.570000 ( 0.520116) g(11): 0.590000 0.010000 0.600000 ( 0.532049) f(21): 1.020000 0.020000 1.040000 ( 0.965179) g(21): 1.200000 0.030000 1.230000 ( 1.031975) f(31): 1.510000 0.020000 1.530000 ( 1.427880) g(31): 1.860000 0.040000 1.900000 ( 1.525376) f(41): 2.040000 0.030000 2.070000 ( 1.904556) g(41): 2.590000 0.060000 2.650000 ( 2.085736) f(51): 2.590000 0.050000 2.640000 ( 2.436998) g(51): 3.250000 0.080000 3.330000 ( 2.611425) sum(f): 8.110000 0.130000 8.240000 ( 7.624393) sum(g): 9.540000 0.220000 9.760000 ( 7.833259)

inukshuk commented 12 years ago

Ha, so whilst this sidetracked us a little bit, at least it makes a lot more sense (would have been weird because there was nothing peculiar about your entries really). Anyway, so next things to try are:

houshuang commented 12 years ago

Sorry for the sidetracking! :)

With name parser

                 user     system      total        real
      f(1):  0.320000   0.000000   0.320000 (  0.312330)
      g(1):  0.040000   0.000000   0.040000 (  0.040179)
     f(11):  0.540000   0.010000   0.550000 (  0.506932)
     g(11):  0.580000   0.020000   0.600000 (  0.525592)
     f(21):  0.940000   0.010000   0.950000 (  0.906421)
     g(21):  1.100000   0.030000   1.130000 (  0.952472)
     f(31):  1.480000   0.020000   1.500000 (  1.412313)
     g(31):  1.710000   0.040000   1.750000 (  1.406194)
     f(41):  1.930000   0.030000   1.960000 (  1.850456)
     g(41):  2.330000   0.050000   2.380000 (  1.901826)
     f(51):  2.360000   0.040000   2.400000 (  2.257148)
     g(51):  2.940000   0.070000   3.010000 (  2.367208)
    sum(f):  7.570000   0.110000   7.680000 (  7.245601)
    sum(g):  8.700000   0.210000   8.910000 (  7.193470)

With lexer only:

                 user     system      total        real
      f(1):  0.180000   0.000000   0.180000 (  0.167365)
      g(1):  0.040000   0.000000   0.040000 (  0.044349)
     f(11):  0.530000   0.010000   0.540000 (  0.486000)
     g(11):  0.540000   0.020000   0.560000 (  0.490285)
     f(21):  0.940000   0.010000   0.950000 (  0.908252)
     g(21):  1.060000   0.030000   1.090000 (  0.920382)
     f(31):  1.430000   0.020000   1.450000 (  1.361777)
     g(31):  1.680000   0.040000   1.720000 (  1.379683)
     f(41):  1.900000   0.030000   1.930000 (  1.836740)
     g(41):  2.240000   0.060000   2.300000 (  1.829948)
     f(51):  2.330000   0.030000   2.360000 (  2.254141)
     g(51):  2.900000   0.060000   2.960000 (  2.346938)
    sum(f):  7.310000   0.100000   7.410000 (  7.014275)
    sum(g):  8.460000   0.210000   8.670000 (  7.011585)
inukshuk commented 12 years ago

(I take it you meant to say, without name parsing above, right?)

Right, we're getting closer now. Judging on the basis of these results it seems that the lexical analysis is responsible for slow performance on MacRuby. In a way, that's great, because the lexer needs to rewritten anyway (see #26).

I'm not sure I'll be able to write the single pass lexer in the next few days, but I'll try to take a quick stab at it and also come up with a better benchmarking script to allow us to see what part of the lexical analysis consumes so much time.

houshuang commented 12 years ago

I've never used Instruments, and there is no documentation available. I posted on StackOverflow asking for help.

dche commented 12 years ago

Hi,

Come here from the MacRuby mail list.

To profile MacRuby code, use DTrace, http://www.macruby.org/blog/2009/11/17/macruby05b2.html shows an example.

MacRuby source tree is needed, and DTrace scripts can be found in sample-macruby/DTrace/.

Open two terminal windows, in the first one type, $ your-benchmark-script.rb &, remember the PID, and then in second window type, $ sudo dtrace -s path-to-macruby-source/sample-macruby/DTrace/methods_duration.d -p pid. You may need to add several seconds pause at the beginning of the benchmark script, so you have time to type in the second window.

The output is long, but "Regexp#initialize" still can be singled out very easily.

It seems that MRI "caches" compiled regexp, while MacRuby doesn't and re-creates every regexp literal constants again and again. Try timing following script, the difference is also huge.

1_000_000.times do
  "123" =~ /^[0-9]+$/
end

So caching all regexp literals to ivars might be a viable solution to #37.

inukshuk commented 12 years ago

Awesome @dche, thanks a lot!

I suggest we try moving the regular expressions to a class instance variable; this should give us a minimal overhead on MRI but simulate caching for MacRuby.

inukshuk commented 12 years ago

@houshuang could you run your benchmarks against master and give us the numbers?

houshuang commented 12 years ago

Doesn't seem like it really helped :(

ruby benchmark.rb 21:33:33 ☁ master ☀ user system total real f(1): 0.000000 0.000000 0.000000 ( 0.000640) g(1): 0.000000 0.000000 0.000000 ( 0.000421) f(21): 0.010000 0.000000 0.010000 ( 0.011741) g(21): 0.010000 0.000000 0.010000 ( 0.007213) f(41): 0.020000 0.000000 0.020000 ( 0.025018) g(41): 0.020000 0.000000 0.020000 ( 0.017973) f(61): 0.030000 0.000000 0.030000 ( 0.029438) g(61): 0.030000 0.000000 0.030000 ( 0.027517) f(81): 0.040000 0.000000 0.040000 ( 0.043976) g(81): 0.050000 0.000000 0.050000 ( 0.047991) f(101): 0.050000 0.000000 0.050000 ( 0.048558) g(101): 0.050000 0.000000 0.050000 ( 0.051856) f(121): 0.060000 0.000000 0.060000 ( 0.063068) g(121): 0.070000 0.000000 0.070000 ( 0.063596) f(141): 0.060000 0.000000 0.060000 ( 0.058472) g(141): 0.060000 0.000000 0.060000 ( 0.062874) f(161): 0.080000 0.000000 0.080000 ( 0.079034) g(161): 0.070000 0.000000 0.070000 ( 0.076821) f(181): 0.090000 0.000000 0.090000 ( 0.084116) g(181): 0.090000 0.000000 0.090000 ( 0.091101) f(201): 0.090000 0.000000 0.090000 ( 0.093750) g(201): 0.100000 0.000000 0.100000 ( 0.100371) f(221): 0.100000 0.000000 0.100000 ( 0.097935) g(221): 0.110000 0.000000 0.110000 ( 0.110750) f(241): 0.100000 0.010000 0.110000 ( 0.105077) g(241): 0.130000 0.000000 0.130000 ( 0.125688) f(261): 0.120000 0.000000 0.120000 ( 0.122992) g(261): 0.140000 0.000000 0.140000 ( 0.138203) f(281): 0.140000 0.000000 0.140000 ( 0.139820) g(281): 0.150000 0.000000 0.150000 ( 0.148650) f(301): 0.150000 0.000000 0.150000 ( 0.153027) g(301): 0.170000 0.000000 0.170000 ( 0.173788) f(321): 0.150000 0.000000 0.150000 ( 0.147638) g(321): 0.180000 0.000000 0.180000 ( 0.183194) f(341): 0.160000 0.000000 0.160000 ( 0.157463) g(341): 0.200000 0.000000 0.200000 ( 0.202536)

[test] macruby benchmark.rb 21:34:36 ☁ master ☂ ⚡ user system total real f(1): 0.380000 0.000000 0.380000 ( 0.374224) g(1): 0.040000 0.000000 0.040000 ( 0.044956) f(21): 1.060000 0.020000 1.080000 ( 0.962675) g(21): 1.100000 0.030000 1.130000 ( 0.990855) f(41): 2.030000 0.040000 2.070000 ( 1.916846) g(41): 2.510000 0.060000 2.570000 ( 2.068640) f(61): 2.960000 0.050000 3.010000 ( 2.783313) g(61): 4.100000 0.100000 4.200000 ( 3.287772) f(81): 4.650000 0.090000 4.740000 ( 4.485962) g(81): 5.630000 0.140000 5.770000 ( 4.453925) f(101): 5.280000 0.090000 5.370000 ( 4.918456) g(101): 7.020000 0.180000 7.200000 ( 5.580300) f(121): 6.240000 0.100000 6.340000 ( 5.785481) g(121): 8.710000 0.220000 8.930000 ( 6.843820) f(141): 7.560000 0.140000 7.700000 ( 6.965423) g(141): 10.810000 0.260000 11.070000 ( 8.324514) f(161): 8.520000 0.160000 8.680000 ( 7.783367) g(161): 12.760000 0.320000 13.080000 ( 9.747104) f(181): 9.820000 0.180000 10.000000 ( 8.946036) g(181): 14.630000 0.440000 15.070000 ( 11.234789) f(201): 11.190000 0.210000 11.400000 ( 10.065997) g(201): 16.940000 0.660000 17.600000 ( 13.150330) f(221): 12.310000 0.230000 12.540000 ( 11.014733) g(221): 18.410000 0.830000 19.240000 ( 14.411930) f(241): 13.650000 0.250000 13.900000 ( 12.084216) g(241): 20.820000 1.080000 21.900000 ( 16.397737) f(261): 15.020000 0.270000 15.290000 ( 13.133269) g(261): 23.820000 1.350000 25.170000 ( 19.101470) f(281): 16.660000 0.300000 16.960000 ( 14.486580) g(281): 25.540000 1.710000 27.250000 ( 20.518307) f(301): 17.600000 0.320000 17.920000 ( 15.103449) g(301): 27.690000 2.090000 29.780000 ( 22.433962) f(321): 19.280000 0.360000 19.640000 ( 16.325466) g(321): 29.380000 2.450000 31.830000 ( 24.070024) f(341): 20.880000 0.390000 21.270000 ( 17.473757) g(341): 33.470000 3.160000 36.630000 ( 28.516775)

houshuang commented 12 years ago

I did a sample of benchmark running MacRuby and uploaded it here, not sure if you can make anything out of it. http://dl.dropbox.com/u/1341682/out.zip

I also did the two other DTrace functions collected_objects and function_count in this file: http://dl.dropbox.com/u/1341682/more_dtrace.zip

This is just over an arbitrary period of time running Benchmark. If any of this is useful and you want me to run it again over a specific time, with specific settings etc, let me know.

These are the three DTrace scripts that are included with MacRuby.

Thanks a lot for your input by the way, @dche!

inukshuk commented 12 years ago

According to your results Regexp#initialize is most definitely the culprit. So why didn't this help?

houshuang commented 12 years ago

I confirmed that I'm on rev 59ccee962ed497a6648bfc97930d65d862aef46a, and that the @patterns are there.

I changed to benchmark only the lexer. Again widely different results:

[bibtex-ruby] ruby test/benchmark.rb 7:26:42 ☁ master ☂ ⚡ user system total real f(1): 0.000000 0.000000 0.000000 ( 0.000313) g(1): 0.000000 0.000000 0.000000 ( 0.000231) f(21): 0.000000 0.000000 0.000000 ( 0.003490) g(21): 0.010000 0.000000 0.010000 ( 0.006004) f(41): 0.010000 0.000000 0.010000 ( 0.006657) g(41): 0.000000 0.000000 0.000000 ( 0.008503) f(61): 0.020000 0.000000 0.020000 ( 0.015250) g(61): 0.020000 0.000000 0.020000 ( 0.016912) f(81): 0.010000 0.000000 0.010000 ( 0.015483) g(81): 0.020000 0.000000 0.020000 ( 0.017838) f(101): 0.020000 0.000000 0.020000 ( 0.020783) g(101): 0.020000 0.000000 0.020000 ( 0.020176)

[bibtex-ruby] macruby test/benchmark.rb 7:27:01 ☁ master ☂ ⚡ user system total real f(1): 0.140000 0.000000 0.140000 ( 0.142504) g(1): 0.040000 0.000000 0.040000 ( 0.038974) f(21): 0.910000 0.020000 0.930000 ( 0.840123) g(21): 0.940000 0.030000 0.970000 ( 0.859722) f(41): 1.680000 0.020000 1.700000 ( 1.615430) g(41): 2.160000 0.060000 2.220000 ( 1.764287) f(61): 2.570000 0.040000 2.610000 ( 2.467326) g(61): 3.560000 0.080000 3.640000 ( 2.895678) f(81): 3.630000 0.060000 3.690000 ( 3.485160) g(81): 4.610000 0.120000 4.730000 ( 3.720300) f(101): 4.530000 0.070000 4.600000 ( 4.375221) g(101): 6.060000 0.160000 6.220000 ( 4.877677)

Just to make sure this wasn't an artefact of the benchmarking script, I wrote a simple script that requires (the modified) bibtex, parses a file with about 60 publications, and prints the first one. MRI: 0.21, MacRuby: 24.16

inukshuk commented 12 years ago

If I read these results correctly (compared to the lexer-only ones you posted originally) it is now even slower than before. Can you confirm that?(Just MacRuby lexer – release / master). Weird!

houshuang commented 12 years ago

True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)'

[test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.130000 0.000000 0.130000 ( 0.136986) g(1): 0.040000 0.000000 0.040000 ( 0.038374) f(21): 0.880000 0.010000 0.890000 ( 0.813287) g(21): 0.950000 0.030000 0.980000 ( 0.849334) f(41): 1.640000 0.020000 1.660000 ( 1.570212) g(41): 2.110000 0.060000 2.170000 ( 1.719145) f(61): 2.450000 0.030000 2.480000 ( 2.348409) g(61): 3.300000 0.080000 3.380000 ( 2.665291) f(81): 3.270000 0.050000 3.320000 ( 3.121622) g(81): 4.520000 0.120000 4.640000 ( 3.619892) f(101): 4.100000 0.060000 4.160000 ( 3.900480) g(101): 5.730000 0.150000 5.880000 ( 4.561672) f(121):^C [test] macruby benchmark.rb 8:14:55 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.140000 0.000000 0.140000 ( 0.140768) g(1): 0.040000 0.000000 0.040000 ( 0.038383) f(21): 0.890000 0.010000 0.900000 ( 0.813669) g(21): 0.940000 0.030000 0.970000 ( 0.847099) f(41): 1.650000 0.020000 1.670000 ( 1.586052) g(41): 2.110000 0.050000 2.160000 ( 1.724443) f(61): 2.460000 0.040000 2.500000 ( 2.347957) g(61): 3.220000 0.080000 3.300000 ( 2.619925) f(81): 3.280000 0.050000 3.330000 ( 3.114385) g(81): 4.480000 0.110000 4.590000 ( 3.596411) f(101): 4.080000 0.060000 4.140000 ( 3.895985) g(101): 5.760000 0.150000 5.910000 ( 4.611965)

inukshuk commented 12 years ago

Can you try this again but having uninstalled the gem first? Alternatively, you can also print out (before the benchmark starts) the value of:

BibTeX::Lexer.patterns

These should exist on master. If they don't we're somehow pulling in an old version.

On Mar 5, 2012, at 2:16 PM, Stian Håklev wrote:

True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)'

[test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.130000 0.000000 0.130000 ( 0.136986) g(1): 0.040000 0.000000 0.040000 ( 0.038374) f(21): 0.880000 0.010000 0.890000 ( 0.813287) g(21): 0.950000 0.030000 0.980000 ( 0.849334) f(41): 1.640000 0.020000 1.660000 ( 1.570212) g(41): 2.110000 0.060000 2.170000 ( 1.719145) f(61): 2.450000 0.030000 2.480000 ( 2.348409) g(61): 3.300000 0.080000 3.380000 ( 2.665291) f(81): 3.270000 0.050000 3.320000 ( 3.121622) g(81): 4.520000 0.120000 4.640000 ( 3.619892) f(101): 4.100000 0.060000 4.160000 ( 3.900480) g(101): 5.730000 0.150000 5.880000 ( 4.561672) f(121):^C [test] macruby benchmark.rb 8:14:55 ☁ master ☂ ⚡ ✭ user system total real f(1): 0.140000 0.000000 0.140000 ( 0.140768) g(1): 0.040000 0.000000 0.040000 ( 0.038383) f(21): 0.890000 0.010000 0.900000 ( 0.813669) g(21): 0.940000 0.030000 0.970000 ( 0.847099) f(41): 1.650000 0.020000 1.670000 ( 1.586052) g(41): 2.110000 0.050000 2.160000 ( 1.724443) f(61): 2.460000 0.040000 2.500000 ( 2.347957) g(61): 3.220000 0.080000 3.300000 ( 2.619925) f(81): 3.280000 0.050000 3.330000 ( 3.114385) g(81): 4.480000 0.110000 4.590000 ( 3.596411) f(101): 4.080000 0.060000 4.140000 ( 3.895985) g(101): 5.760000 0.150000 5.910000 ( 4.611965)


Reply to this email directly or view it on GitHub: https://github.com/inukshuk/bibtex-ruby/issues/37#issuecomment-4322426

houshuang commented 12 years ago

Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get

release:

         user     system      total        real
      f(1):  0.140000   0.000000   0.140000 (  0.136864)
      g(1):  0.050000   0.000000   0.050000 (  0.040198)
     f(21):  0.880000   0.020000   0.900000 (  0.828929)
     g(21):  0.950000   0.020000   0.970000 (  0.849521)
     f(41):  1.690000   0.030000   1.720000 (  1.619234)
     g(41):  2.110000   0.050000   2.160000 (  1.740376)
     f(61):  2.470000   0.040000   2.510000 (  2.362543)
     g(61):  3.270000   0.080000   3.350000 (  2.652986)

master

                 user     system      total        real
      f(1):  0.150000   0.000000   0.150000 (  0.143398)
      g(1):  0.070000   0.000000   0.070000 (  0.052197)
     f(21):  0.910000   0.020000   0.930000 (  0.853703)
     g(21):  0.940000   0.020000   0.960000 (  0.853563)
     f(41):  1.690000   0.030000   1.720000 (  1.622108)
     g(41):  2.130000   0.050000   2.180000 (  1.741724)
     f(61):  2.440000   0.040000   2.480000 (  2.343833)
     g(61):  3.290000   0.080000   3.370000 (  2.669190)
inukshuk commented 12 years ago

On Mar 5, 2012, at 3:40 PM, Stian Håklev wrote:

Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get

How did you get around this? I'm sorry, I forgot to mention this earlier, but on master, you will have to generate the parsers (that's why parser and name_parser were missing). You can generate the parsers like this:

$ bundle install
$ bundle exec rake racc

After this, there should be the parser.rb and name_parser.rb files in the lib directory.

houshuang commented 12 years ago

This is really useful! Well I'm learning more and more about BibTex-ruby, but it doesn't seem to be getting us anywhere... Here's the latest master, with gem uninstalled, and parser generated. This is using parser:

                 user     system      total        real
      f(1):  0.510000   0.010000   0.520000 (  0.492472)
      g(1):  0.060000   0.000000   0.060000 (  0.055886)
     f(21):  1.370000   0.030000   1.400000 (  1.282683)
     g(21):  1.370000   0.030000   1.400000 (  1.243531)
     f(41):  2.530000   0.050000   2.580000 (  2.428568)
     g(41):  3.150000   0.080000   3.230000 (  2.638907)
     f(61):  3.550000   0.060000   3.610000 (  3.359150)
     g(61):  4.740000   0.120000   4.860000 (  3.822768)

and this using lexer

                 user     system      total        real
      f(1):  0.510000   0.000000   0.510000 (  0.521181)
      g(1):  0.050000   0.000000   0.050000 (  0.056028)
     f(21):  2.390000   0.040000   2.430000 (  2.246512)
     g(21):  1.420000   0.040000   1.460000 (  1.261098)
     f(41):  4.510000   0.080000   4.590000 (  4.288429)
     g(41):  3.290000   0.080000   3.370000 (  2.695208)
     f(61):  7.040000   0.130000   7.170000 (  6.704729)
     g(61):  4.650000   0.110000   4.760000 (  3.736637)
inukshuk commented 12 years ago

I just installed a macruby-nightly to finally get to the bottom of this, but get an error in the racc gem about a missing symbol (_rb_catch); have you run into this before?

houshuang commented 12 years ago

No, but I guess I ran racc using MRI Ruby. You only run it once to generate right, you don't require it when executing bibtex?

Thanks for all the time you spend on this!

Stian

inukshuk commented 12 years ago

It's true that we don't necessarily have to generate the parser with MacRuby (in fact, the gem ships with the generated parser). Problem is, I get the error when executing the parser.

What MacRuby version are you using precisely?

houshuang commented 12 years ago

0.12 - I compiled it from nightly about a week ago.

inukshuk commented 12 years ago

I finally had time for more detailed profiling and I believe that the problem lies with MacRuby's StringScanner implementation: they create a new regular expression object every time #scan is called. That's why our explicit pattern cache does not make big difference at all.

mapreal19 commented 5 years ago

Closing since we don't support macruby anymore