ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
177 stars 113 forks source link

Avoid dynamic parse method dispatch for faster access #311

Closed marshall-lee closed 1 month ago

marshall-lee commented 1 month ago

On some benchmarks it seems to make a difference:

N_ROWS=5000 rake benchmark:parse benchmark:parse_liberal_parsing benchmark:parse_quote_char_nil benchmark:parse_strip
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
            unquoted     22.147      22.131 i/s -     100.000 times in 4.515187s 4.518589s
              quoted     11.517      12.997 i/s -     100.000 times in 8.682986s 7.694298s
               mixed     14.097      13.964 i/s -     100.000 times in 7.093660s 7.161389s
     include_col_sep      5.214       5.188 i/s -     100.000 times in 19.178537s 19.277059s
     include_row_sep      5.195       5.101 i/s -     100.000 times in 19.250419s 19.605061s
        encode_utf-8     16.030      15.984 i/s -     100.000 times in 6.238449s 6.256427s
         encode_sjis     16.546      16.376 i/s -     100.000 times in 6.043603s 6.106603s

Comparison:
                         unquoted
           csv 3.3.0:        22.1 i/s
              master:        22.1 i/s - 1.00x  slower

                           quoted
              master:        13.0 i/s
           csv 3.3.0:        11.5 i/s - 1.13x  slower

                            mixed
           csv 3.3.0:        14.1 i/s
              master:        14.0 i/s - 1.01x  slower

                  include_col_sep
           csv 3.3.0:         5.2 i/s
              master:         5.2 i/s - 1.01x  slower

                  include_row_sep
           csv 3.3.0:         5.2 i/s
              master:         5.1 i/s - 1.02x  slower

                     encode_utf-8
           csv 3.3.0:        16.0 i/s
              master:        16.0 i/s - 1.00x  slower

                      encode_sjis
           csv 3.3.0:        16.5 i/s
              master:        16.4 i/s - 1.01x  slower
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_liberal_parsing.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                                   csv 3.3.0      master
                         unquoted      8.132       8.250 i/s -     100.000 times in 12.297793s 12.121689s
         unquoted_backslash_quote      3.868       3.866 i/s -     100.000 times in 25.849956s 25.869413s
                           quoted      3.642       3.638 i/s -     100.000 times in 27.454032s 27.484247s
quoted_double_quote_outside_quote      2.277       2.202 i/s -     100.000 times in 43.921488s 45.419138s
           quoted_backslash_quote      1.801       1.803 i/s -     100.000 times in 55.522265s 55.464641s
                  include_col_sep      3.644       3.633 i/s -     100.000 times in 27.440353s 27.528626s
                  include_row_sep      3.629       3.614 i/s -     100.000 times in 27.559354s 27.670274s
                     encode_utf-8      8.149       8.136 i/s -     100.000 times in 12.270936s 12.290646s
                      encode_sjis      8.527       8.425 i/s -     100.000 times in 11.727969s 11.868855s

Comparison:
                                      unquoted
                           master:         8.2 i/s
                        csv 3.3.0:         8.1 i/s - 1.01x  slower

                      unquoted_backslash_quote
                        csv 3.3.0:         3.9 i/s
                           master:         3.9 i/s - 1.00x  slower

                                        quoted
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

             quoted_double_quote_outside_quote
                        csv 3.3.0:         2.3 i/s
                           master:         2.2 i/s - 1.03x  slower

                        quoted_backslash_quote
                           master:         1.8 i/s
                        csv 3.3.0:         1.8 i/s - 1.00x  slower

                               include_col_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                               include_row_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                                  encode_utf-8
                        csv 3.3.0:         8.1 i/s
                           master:         8.1 i/s - 1.00x  slower

                                   encode_sjis
                        csv 3.3.0:         8.5 i/s
                           master:         8.4 i/s - 1.01x  slower
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_quote_char_nil.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
  without_quote_char     22.840      22.844 i/s -     100.000 times in 4.378284s 4.377488s
      quote_char_nil     32.370      43.729 i/s -     100.000 times in 3.089285s 2.286831s
       col_sep_space     12.135      12.106 i/s -     100.000 times in 8.240368s 8.260030s

Comparison:
               without_quote_char
              master:        22.8 i/s
           csv 3.3.0:        22.8 i/s - 1.00x  slower

                   quote_char_nil
              master:        43.7 i/s
           csv 3.3.0:        32.4 i/s - 1.35x  slower

                    col_sep_space
           csv 3.3.0:        12.1 i/s
              master:        12.1 i/s - 1.00x  slower
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_strip.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
             default     13.132      13.043 i/s -     100.000 times in 7.615051s 7.667227s
      no_quote_strip      8.955       8.957 i/s -     100.000 times in 11.167272s 11.164189s

Comparison:
                          default
           csv 3.3.0:        13.1 i/s
              master:        13.0 i/s - 1.01x  slower

                   no_quote_strip
              master:         9.0 i/s
           csv 3.3.0:         9.0 i/s - 1.00x  slower
kou commented 1 month ago

Could you benchmark the __send__ approach too? I think that __send__ is faster than method.call but we should measure them.

marshall-lee commented 1 month ago

@kou they appear to perform the same but I agree that __send__ is simpler.

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
            unquoted     22.147      22.131 i/s -     100.000 times in 4.515187s 4.518589s
              quoted     11.517      12.997 i/s -     100.000 times in 8.682986s 7.694298s
               mixed     14.097      13.964 i/s -     100.000 times in 7.093660s 7.161389s
     include_col_sep      5.214       5.188 i/s -     100.000 times in 19.178537s 19.277059s
     include_row_sep      5.195       5.101 i/s -     100.000 times in 19.250419s 19.605061s
        encode_utf-8     16.030      15.984 i/s -     100.000 times in 6.238449s 6.256427s
         encode_sjis     16.546      16.376 i/s -     100.000 times in 6.043603s 6.106603s

Comparison:
                         unquoted
           csv 3.3.0:        22.1 i/s
              master:        22.1 i/s - 1.00x  slower

                           quoted
              master:        13.0 i/s
           csv 3.3.0:        11.5 i/s - 1.13x  slower

                            mixed
           csv 3.3.0:        14.1 i/s
              master:        14.0 i/s - 1.01x  slower

                  include_col_sep
           csv 3.3.0:         5.2 i/s
              master:         5.2 i/s - 1.01x  slower

                  include_row_sep
           csv 3.3.0:         5.2 i/s
              master:         5.1 i/s - 1.02x  slower

                     encode_utf-8
           csv 3.3.0:        16.0 i/s
              master:        16.0 i/s - 1.00x  slower

                      encode_sjis
           csv 3.3.0:        16.5 i/s
              master:        16.4 i/s - 1.01x  slower
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_liberal_parsing.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                                   csv 3.3.0      master
                         unquoted      8.132       8.250 i/s -     100.000 times in 12.297793s 12.121689s
         unquoted_backslash_quote      3.868       3.866 i/s -     100.000 times in 25.849956s 25.869413s
                           quoted      3.642       3.638 i/s -     100.000 times in 27.454032s 27.484247s
quoted_double_quote_outside_quote      2.277       2.202 i/s -     100.000 times in 43.921488s 45.419138s
           quoted_backslash_quote      1.801       1.803 i/s -     100.000 times in 55.522265s 55.464641s
                  include_col_sep      3.644       3.633 i/s -     100.000 times in 27.440353s 27.528626s
                  include_row_sep      3.629       3.614 i/s -     100.000 times in 27.559354s 27.670274s
                     encode_utf-8      8.149       8.136 i/s -     100.000 times in 12.270936s 12.290646s
                      encode_sjis      8.527       8.425 i/s -     100.000 times in 11.727969s 11.868855s

Comparison:
                                      unquoted
                           master:         8.2 i/s
                        csv 3.3.0:         8.1 i/s - 1.01x  slower

                      unquoted_backslash_quote
                        csv 3.3.0:         3.9 i/s
                           master:         3.9 i/s - 1.00x  slower

                                        quoted
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

             quoted_double_quote_outside_quote
                        csv 3.3.0:         2.3 i/s
                           master:         2.2 i/s - 1.03x  slower

                        quoted_backslash_quote
                           master:         1.8 i/s
                        csv 3.3.0:         1.8 i/s - 1.00x  slower

                               include_col_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                               include_row_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                                  encode_utf-8
                        csv 3.3.0:         8.1 i/s
                           master:         8.1 i/s - 1.00x  slower

                                   encode_sjis
                        csv 3.3.0:         8.5 i/s
                           master:         8.4 i/s - 1.01x  slower
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_quote_char_nil.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
  without_quote_char     22.840      22.844 i/s -     100.000 times in 4.378284s 4.377488s
      quote_char_nil     32.370      43.729 i/s -     100.000 times in 3.089285s 2.286831s
       col_sep_space     12.135      12.106 i/s -     100.000 times in 8.240368s 8.260030s

Comparison:
               without_quote_char
              master:        22.8 i/s
           csv 3.3.0:        22.8 i/s - 1.00x  slower

                   quote_char_nil
              master:        43.7 i/s
           csv 3.3.0:        32.4 i/s - 1.35x  slower

                    col_sep_space
           csv 3.3.0:        12.1 i/s
              master:        12.1 i/s - 1.00x  slower
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_strip.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
             default     13.132      13.043 i/s -     100.000 times in 7.615051s 7.667227s
      no_quote_strip      8.955       8.957 i/s -     100.000 times in 11.167272s 11.164189s

Comparison:
                          default
           csv 3.3.0:        13.1 i/s
              master:        13.0 i/s - 1.01x  slower

                   no_quote_strip
              master:         9.0 i/s
           csv 3.3.0:         9.0 i/s - 1.00x  slower
kou commented 1 month ago

Thanks.