rubocop / rubocop-performance

An extension of RuboCop focused on code performance checks.
https://docs.rubocop.org/rubocop-performance
MIT License
677 stars 79 forks source link

Detect regexes where a string could be used in .split #191

Open tas50 opened 3 years ago

tas50 commented 3 years ago

Is your feature request related to a problem? Please describe.

I've noticed that sometimes regexes are used when a simple string would work fine. The code below could / should be simplified to be just a string

so.split(/:/)

Describe the solution you'd like

Autocorrect to

so.split(':')

This is more clear to anyone reading the code and it's 3x faster:

Warming up --------------------------------------
         Regex Split    94.649k i/100ms
        String Split   290.261k i/100ms
Calculating -------------------------------------
         Regex Split    939.299k (± 2.0%) i/s -      4.732M in   5.040237s
        String Split      2.877M (± 1.7%) i/s -     14.513M in   5.046006s

Comparison:
        String Split:  2877000.8 i/s
         Regex Split:   939298.7 i/s - 3.06x  (± 0.00) slower

Describe alternatives you've considered

Not this time

rrosenblum commented 3 years ago

I think we have handled some code similar to this in the Performance cops. Specifically, I think Performance/StringReplacement does some basic checks to see if we can swap regex for a string. I vaguely remember looking into something like this a while back. I think one of the concerns was that the performance of regex vs string changed based on the Ruby version.

koic commented 3 years ago

@rrosenblum Thanks for the suggestion. I will transfer this issue to rubocop-performance because https://github.com/rubocop-hq/rubocop-performance/pull/190 has been opened.

djberg96 commented 3 years ago

I'm not seeing it:

 user               system    total              real
String#split(//)  0.765812   0.000649   0.766461 (  0.766922)
String#split('')  0.809555   0.000397   0.809952 (  0.810363)
require "benchmark"

MAX = 2000000

Benchmark.bm(30) do |x|
  x.report("String#split(//)"){
    string = "hello"
    MAX.times{ string.split(//) }
  }

  x.report("String#split('')"){
    string = "hello"
    MAX.times{ string.split('') }
  }
end
tas50 commented 3 years ago

@djberg96 what ruby release are you running. Here are my results on 2.7.4 from your benchmark snippet:

                                     user     system      total        real
String#split(//)                11.726928   0.100159  11.827087 ( 11.910536)
String#split('')                 2.162106   0.020655   2.182761 (  2.202049)
djberg96 commented 3 years ago

@tas50 That was 3.0.1.

I just tried 2.7.2 and 2.6.6 as well. The 2.7.x results were similar to yours, whereas the 2.6.x results showed both much slower, but the regex was slightly faster.

rrosenblum commented 3 years ago

We've known for a while that the performance measurements for some of the cops can change across Ruby versions. I don't think we document the performance benefits based on Ruby version. Maybe one solution to this that we start documenting performance across Ruby versions, and we start to use that information for targeting whether a cop is enabled or not.

benchmark-ips is the recommended tool for generating performance reports for snippets of code

Many of the cops are based on snippets from fast-ruby. Unfortunately, this project doesn't appear to be well maintained anymore. At one point in time, they did have a build that would show off performance for different versions of Ruby