weppos / publicsuffix-ruby

Domain name parser for Ruby based on the Public Suffix List.
https://simonecarletti.com/code/publicsuffix
MIT License
617 stars 109 forks source link

Replace name_to_labels(s).last with faster extract_tld #129

Closed casperisfine closed 7 years ago

casperisfine commented 7 years ago

Following https://github.com/weppos/publicsuffix-ruby/pull/128 I looked into PublicSuffix::Domain.name_to_labels's performance.

The thing is, it's almost always used to get the TLD, so I benchmark several implementations:

require 'benchmark/ips'

DOMAIN = 'accident-prevention.aero'
DOT = '.'
Benchmark.ips do |x|
  x.report('string.rindex + string.slice') {
    DOMAIN[(DOMAIN.rindex('.')+1..-1)]
  }
  x.report('string.rpartition') {
    DOMAIN.rpartition(DOT)[2]
  }
  x.report('split.last') {
    DOMAIN.split('.').last
  }
  x.report('string.scan') {
    DOMAIN.scan(/[^\.]+\z/).first
  }
  x.report('regexp.match') {
    /[^\.]+\z/.match(DOMAIN)[0]
  }
  x.report('string.slice') {
    DOMAIN.slice(/[^\.]+\z/)
  }
end
string.rindex + string.slice       1.710M (± 6.9%) i/s -      8.539M in   5.020714s
           string.rpartition       2.214M (± 7.7%) i/s -     11.095M in   5.044656s
                  split.last       1.716M (± 4.2%) i/s -      8.638M in   5.044117s
                 string.scan      86.526k (± 7.0%) i/s -    434.673k in   5.053164s
                regexp.match     157.309k (± 8.2%) i/s -    794.035k in   5.084998s
                string.slice     191.887k (± 4.8%) i/s -    972.379k in   5.080122s

So using string.rpartition('.')[2] is 30% faster than string.split('.').last.

It probably won't make a huge difference, but it might shave a few ms off the reindex!.

@weppos thoughts ?

casperisfine commented 7 years ago

Another note, the more dots the name contains, the slower split('.').last, whereas rpartition is less impacted.

Same benchmark using: DOMAIN = 'foo.bar.plop.accident-prevention.aero'

Warming up --------------------------------------
string.rindex + string.slice
                       115.205k i/100ms
   string.rpartition   100.624k i/100ms
          split.last    64.506k i/100ms
         string.scan     7.866k i/100ms
        regexp.match    14.208k i/100ms
        string.slice    16.540k i/100ms
Calculating -------------------------------------
string.rindex + string.slice
                          1.859M (± 3.1%) i/s -      9.332M in   5.024148s
   string.rpartition      1.515M (± 3.1%) i/s -      7.647M in   5.054116s
          split.last    871.871k (± 2.5%) i/s -      4.386M in   5.034232s
         string.scan     80.564k (± 4.1%) i/s -    409.032k in   5.086318s
        regexp.match    151.134k (± 1.5%) i/s -    767.232k in   5.077662s
        string.slice    174.542k (± 2.7%) i/s -    876.620k in   5.026302s
weppos commented 7 years ago

See https://github.com/weppos/publicsuffix-ruby/pull/130#issuecomment-274317007