Add scan and search benchmark

naitoh commented 1 week ago

Why?

To improve the parsing process, I would like to add benchmarks for all parsing processes.

scan_full(regexp, false, true) == StringScanner#check
scan_full(regexp, false, false) == StringScanner#match?
search_full(regexp, false, true) == StringScanner#check_until
search_full(regexp, false, false) == StringScanner#exist?

CRuby

$ benchmark-driver benchmark/full.yaml
Warming up --------------------------------------
         check(reg1)    10.512M i/s -     11.004M times in 1.046730s (95.13ns/i)
         check(str1)    13.246M i/s -     13.499M times in 1.019092s (75.49ns/i)
   check_until(reg2)     9.325M i/s -      9.454M times in 1.013793s (107.24ns/i)
   check_until(str2)    11.235M i/s -     11.278M times in 1.003815s (89.01ns/i)
        match?(reg1)    15.833M i/s -     16.234M times in 1.025334s (63.16ns/i)
        match?(str1)    23.012M i/s -     23.238M times in 1.009821s (43.46ns/i)
        exist?(reg2)    13.196M i/s -     13.378M times in 1.013741s (75.78ns/i)
        exist?(str2)    17.775M i/s -     17.968M times in 1.010822s (56.26ns/i)
Calculating -------------------------------------
         check(reg1)    11.634M i/s -     31.537M times in 2.710745s (85.95ns/i)
         check(str1)    15.131M i/s -     39.739M times in 2.626267s (66.09ns/i)
   check_until(reg2)    10.156M i/s -     27.976M times in 2.754491s (98.46ns/i)
   check_until(str2)    12.666M i/s -     33.705M times in 2.661027s (78.95ns/i)
        match?(reg1)    18.606M i/s -     47.500M times in 2.552881s (53.75ns/i)
        match?(str1)    29.154M i/s -     69.035M times in 2.367971s (34.30ns/i)
        exist?(reg2)    15.087M i/s -     39.589M times in 2.624016s (66.28ns/i)
        exist?(str2)    21.372M i/s -     53.326M times in 2.495113s (46.79ns/i)

Comparison:
        match?(str1):  29153541.2 i/s
        exist?(str2):  21372089.0 i/s - 1.36x  slower
        match?(reg1):  18606308.7 i/s - 1.57x  slower
         check(str1):  15131214.8 i/s - 1.93x  slower
        exist?(reg2):  15087129.4 i/s - 1.93x  slower
   check_until(str2):  12666165.0 i/s - 2.30x  slower
         check(reg1):  11634224.5 i/s - 2.51x  slower
   check_until(reg2):  10156410.0 i/s - 2.87x  slower

JRuby

$ benchmark-driver benchmark/full.yaml
Warming up --------------------------------------
         check(reg1)     8.543M i/s -      8.507M times in 0.995755s (117.05ns/i)
         check(str1)    19.132M i/s -     19.060M times in 0.996223s (52.27ns/i)
   check_until(reg2)     6.528M i/s -      6.527M times in 0.999821s (153.19ns/i)
   check_until(str2)    15.120M i/s -     15.157M times in 1.002440s (66.14ns/i)
        match?(reg1)     9.675M i/s -      9.722M times in 1.004873s (103.36ns/i)
        match?(str1)    25.797M i/s -     25.678M times in 0.995385s (38.76ns/i)
        exist?(reg2)     8.192M i/s -      8.219M times in 1.003290s (122.06ns/i)
        exist?(str2)    20.494M i/s -     20.344M times in 0.992690s (48.80ns/i)
Calculating -------------------------------------
         check(reg1)    11.145M i/s -     25.630M times in 2.299636s (89.73ns/i)
         check(str1)    33.422M i/s -     57.395M times in 1.717318s (29.92ns/i)
   check_until(reg2)    10.132M i/s -     19.584M times in 1.932875s (98.70ns/i)
   check_until(str2)    23.542M i/s -     45.359M times in 1.926731s (42.48ns/i)
        match?(reg1)    13.625M i/s -     29.026M times in 2.130248s (73.39ns/i)
        match?(str1)    48.596M i/s -     77.392M times in 1.592559s (20.58ns/i)
        exist?(reg2)    11.399M i/s -     24.577M times in 2.156138s (87.73ns/i)
        exist?(str2)    33.873M i/s -     61.481M times in 1.815073s (29.52ns/i)

Comparison:
        match?(str1):  48595754.4 i/s
        exist?(str2):  33872665.7 i/s - 1.43x  slower
         check(str1):  33421510.3 i/s - 1.45x  slower
   check_until(str2):  23542122.9 i/s - 2.06x  slower
        match?(reg1):  13625479.7 i/s - 3.57x  slower
        exist?(reg2):  11398842.2 i/s - 4.26x  slower
         check(reg1):  11145102.7 i/s - 4.36x  slower
   check_until(reg2):  10131993.8 i/s - 4.80x  slower

kou commented 1 week ago

I agree that we should add this benchmark but can we use more meaningful file name than full? full is too implementation specific and meaningless for this case. (scan_full()/search_full() mean that they accept "full" options. They don't mean any features.)

naitoh commented 1 week ago

@kou

but can we use more meaningful file name than full?

How about the following file name?

scan_search.yaml

kou commented 1 week ago

It's better than "full".

Do we need to use one file for all cases? If we use 2 files for check/match? and check_until/exist?, can we use better name than "scan_search"?

naitoh commented 1 week ago

I have renamed it to scan_and_search.yaml.

Do we need to use one file for all cases?

In JRuby, there are differences in results for each execution, so I would like to execution everything in one file.

JRuby

$ benchmark-driver benchmark/scan_and_search.yaml
Warming up --------------------------------------
         check(reg1)     7.249M i/s -      7.207M times in 0.994265s (137.95ns/i)
         check(str1)    19.899M i/s -     19.847M times in 0.997392s (50.25ns/i)
   check_until(reg2)     7.918M i/s -      7.918M times in 0.999911s (126.29ns/i)
   check_until(str2)    15.581M i/s -     15.463M times in 0.992425s (64.18ns/i)
        match?(reg1)     8.717M i/s -      8.742M times in 1.002787s (114.71ns/i)
        match?(str1)    25.952M i/s -     25.689M times in 0.989893s (38.53ns/i)
        exist?(reg2)     8.715M i/s -      8.700M times in 0.998353s (114.75ns/i)
        exist?(str2)    20.071M i/s -     19.938M times in 0.993373s (49.82ns/i)
Calculating -------------------------------------
         check(reg1)    12.048M i/s -     21.747M times in 1.804991s (83.00ns/i)
         check(str1)    30.335M i/s -     59.697M times in 1.967916s (32.97ns/i)
   check_until(reg2)     9.803M i/s -     23.755M times in 2.423189s (102.01ns/i)
   check_until(str2)    24.277M i/s -     46.743M times in 1.925398s (41.19ns/i)
        match?(reg1)    13.778M i/s -     26.152M times in 1.898151s (72.58ns/i)
        match?(str1)    51.994M i/s -     77.855M times in 1.497389s (19.23ns/i)
        exist?(reg2)    11.644M i/s -     26.144M times in 2.245393s (85.88ns/i)
        exist?(str2)    32.540M i/s -     60.213M times in 1.850455s (30.73ns/i)

Comparison:
        match?(str1):  51994015.6 i/s 
        exist?(str2):  32539799.8 i/s - 1.60x  slower
         check(str1):  30335149.7 i/s - 1.71x  slower
   check_until(str2):  24277229.9 i/s - 2.14x  slower
        match?(reg1):  13777722.9 i/s - 3.77x  slower
         check(reg1):  12048062.8 i/s - 4.32x  slower
        exist?(reg2):  11643598.9 i/s - 4.47x  slower
   check_until(reg2):   9803153.3 i/s - 5.30x  slower

$ benchmark-driver benchmark/scan_and_search.yaml
Warming up --------------------------------------
         check(reg1)     8.691M i/s -      8.643M times in 0.994453s (115.06ns/i)
         check(str1)    19.273M i/s -     19.116M times in 0.991856s (51.89ns/i)
   check_until(reg2)     7.308M i/s -      7.276M times in 0.995534s (136.83ns/i)
   check_until(str2)    15.596M i/s -     15.458M times in 0.991141s (64.12ns/i)
        match?(reg1)     8.627M i/s -      8.608M times in 0.997741s (115.91ns/i)
        match?(str1)    25.974M i/s -     25.865M times in 0.995829s (38.50ns/i)
        exist?(reg2)     8.417M i/s -      8.475M times in 1.006861s (118.80ns/i)
        exist?(str2)     9.009M i/s -      8.976M times in 0.996368s (111.00ns/i)
Calculating -------------------------------------
         check(reg1)    11.344M i/s -     26.073M times in 2.298316s (88.15ns/i)
         check(str1)    35.082M i/s -     57.819M times in 1.648119s (28.50ns/i)
   check_until(reg2)     9.956M i/s -     21.925M times in 2.202211s (100.44ns/i)
   check_until(str2)    23.878M i/s -     46.788M times in 1.959452s (41.88ns/i)
        match?(reg1)    11.412M i/s -     25.882M times in 2.267979s (87.63ns/i)
        match?(str1)    49.088M i/s -     77.921M times in 1.587374s (20.37ns/i)
        exist?(reg2)    11.628M i/s -     25.252M times in 2.171684s (86.00ns/i)
        exist?(str2)    32.934M i/s -     27.027M times in 0.820653s (30.36ns/i)

Comparison:
        match?(str1):  49087994.1 i/s 
         check(str1):  35081736.9 i/s - 1.40x  slower
        exist?(str2):  32933597.9 i/s - 1.49x  slower
   check_until(str2):  23878238.9 i/s - 2.06x  slower
        exist?(reg2):  11627809.5 i/s - 4.22x  slower
        match?(reg1):  11411983.2 i/s - 4.30x  slower
         check(reg1):  11344492.2 i/s - 4.33x  slower
   check_until(reg2):   9955951.4 i/s - 4.93x  slower

kou commented 1 week ago

Do we need to compare check and check_until (match? and exist?)? They are different operations and use cases. (check/match? check only at the current scan pointer and check_until/exist? check from the current scan pointer.)

I think that JRuby's unstable results show a different problem. We may need to use more long target string (and/or pattern) to make the target operations the main operation in the benchmark.

ruby / strscan