rubocop / ruby-style-guide

A community-driven Ruby coding style guide
https://rubystyle.guide
16.47k stars 3.4k forks source link

Fix code in Explicit Use of the Case Equality Operator #927

Closed ydakuka closed 1 year ago

ydakuka commented 1 year ago

https://docs.rubocop.org/rubocop-performance/cops_performance.html#performancerangeinclude

pirj commented 1 year ago

https://github.com/fastruby/fast-ruby#range

Does this still stand now, 8 years later, with modern rubies? How about infinite ranges?

ydakuka commented 1 year ago

Does this still stand now, 8 years later, with modern rubies?

Yes, this is still correct for the latest ruby version.

Here is the code below:

I've created the Dockerfile:

FROM ruby:3.2.2
COPY . /var/www/ruby  
WORKDIR /var/www/ruby  
RUN gem install benchmark-ips
RUN ruby -v
CMD ["ruby", "file.rb"]

Ruby version:

ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]

And I got the following:

1 case

The code (https://github.com/fastruby/fast-ruby/blob/main/code/range/cover-vs-include.rb):

require "benchmark/ips"
require "date"

BEGIN_OF_JULY = Date.new(2015, 7, 1)
END_OF_JULY = Date.new(2015, 7, 31)
DAY_IN_JULY = Date.new(2015, 7, 15)

Benchmark.ips do |x|
  x.report('range#cover?') { (BEGIN_OF_JULY..END_OF_JULY).cover? DAY_IN_JULY }
  x.report('range#include?') { (BEGIN_OF_JULY..END_OF_JULY).include? DAY_IN_JULY }
  x.report('range#member?') { (BEGIN_OF_JULY..END_OF_JULY).member? DAY_IN_JULY }
  x.report('plain compare') { BEGIN_OF_JULY < DAY_IN_JULY && DAY_IN_JULY < END_OF_JULY }
  x.report('value.between?') { DAY_IN_JULY.between?(BEGIN_OF_JULY, END_OF_JULY) }

  x.compare!
end

The result:

ydakuka@yauhenid:~/ruby-docker-app$ docker run ruby-app
Warming up --------------------------------------
        range#cover?   257.651k i/100ms
      range#include?    12.085k i/100ms
       range#member?    12.506k i/100ms
       plain compare   430.355k i/100ms
      value.between?   543.628k i/100ms
Calculating -------------------------------------
        range#cover?      2.638M (± 6.2%) i/s -     13.140M in   5.001433s
      range#include?    108.201k (± 5.9%) i/s -    543.825k in   5.044048s
       range#member?    109.532k (± 6.9%) i/s -    550.264k in   5.050479s
       plain compare      4.001M (± 6.3%) i/s -     20.227M in   5.077226s
      value.between?      4.698M (± 7.2%) i/s -     23.920M in   5.119302s

Comparison:
      value.between?:  4698304.6 i/s
       plain compare:  4000963.7 i/s - 1.17x  slower
        range#cover?:  2637945.2 i/s - 1.78x  slower
       range#member?:   109531.8 i/s - 42.89x  slower
      range#include?:   108201.5 i/s - 43.42x  slower

2 case

The code:

require "benchmark/ips"
require "date"

END_OF_JULY = Date.new(2015, 7, 31)
DAY_IN_JULY = Date.new(2015, 7, 15)

Benchmark.ips do |x|
  x.report('range#cover?') { (..END_OF_JULY).cover? DAY_IN_JULY }
  x.report('range#include?') { (..END_OF_JULY).include? DAY_IN_JULY }
  x.report('range#member?') { (..END_OF_JULY).member? DAY_IN_JULY }

  x.compare!
end

The result:

ydakuka@yauhenid:~/ruby-docker-app$ docker run ruby-app
Warming up --------------------------------------
        range#cover?   486.750k i/100ms
      range#include?file.rb:9:in `include?': cannot determine inclusion in beginless/endless ranges (TypeError)

  x.report('range#include?') { (..END_OF_JULY).include? DAY_IN_JULY }
                                                        ^^^^^^^^^^^
  from file.rb:9:in `block (2 levels) in <main>'
  from (eval):6:in `call_times'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:285:in `block in run_warmup'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:268:in `each'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:268:in `run_warmup'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:253:in `block in run'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:252:in `times'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:252:in `run'
  from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips.rb:52:in `ips'
  from file.rb:7:in `<main>'

3 case

The code:

require "benchmark/ips"
require "date"

FIRST_NUMBER = 100_000
SECOND_NUMBER = 1234

Benchmark.ips do |x|
  x.report('range#cover?') { (..FIRST_NUMBER).cover? SECOND_NUMBER }
  x.report('range#include?') { (..FIRST_NUMBER).include? SECOND_NUMBER }

  x.compare!
end

The result:

ydakuka@yauhenid:~/ruby-docker-app$ docker run ruby-app
Warming up --------------------------------------
        range#cover?   605.669k i/100ms
      range#include?   618.267k i/100ms
Calculating -------------------------------------
        range#cover?      6.074M (± 1.7%) i/s -     30.889M in   5.086611s
      range#include?      6.227M (± 2.0%) i/s -     31.532M in   5.065797s

Comparison:
      range#include?:  6227278.6 i/s
        range#cover?:  6074487.3 i/s - same-ish: difference falls within error
pirj commented 1 year ago

Thanks a lot for detailed research. Some Ruby behaviour here is surprising.

  1. The “cannot determine inclusion in beginless/endless ranges” claim which goes against that it actually can determine inclusion in your case 3
  2. It can determine inclusion with beginless, but not endless ranges

The difference as I can think of it is probably due to the difference between a discrete and a continuous ranges. E.g for a (1..5) and 3.5 ‘include?’ would be ‘false’, but ‘cover?’ would be ‘true’.

For our matter, which would be the right substitute for ‘(1..100)’ and ‘7’, both? Should we emphasize that for 7.5 it won’t be this way, and one would need ‘include?’?

ydakuka commented 1 year ago

It can determine inclusion with beginless, but not endless ranges

No, the behavior for beginless and endless ranges is the same.

The code:

require "benchmark/ips"
require "date"

BEGIN_OF_JULY = Date.new(2015, 7, 1)
DAY_IN_JULY = Date.new(2015, 7, 15)

Benchmark.ips do |x|
  x.report('range#cover?') { (BEGIN_OF_JULY..).cover? DAY_IN_JULY }
  x.report('range#include?') { (BEGIN_OF_JULY..).include? DAY_IN_JULY }
  x.report('range#member?') { (BEGIN_OF_JULY..).member? DAY_IN_JULY }

  x.compare!
end

The result:

ydakuka@yauhenid:~/ruby-docker-app$ docker run ruby-app  
Warming up --------------------------------------
        range#cover?   319.908k i/100ms
      range#include?file.rb:10:in `include?': cannot determine inclusion in beginless/endless ranges (TypeError)

  x.report('range#include?') { (BEGIN_OF_JULY..).include? DAY_IN_JULY }
                                                          ^^^^^^^^^^^
    from file.rb:10:in `block (2 levels) in <main>'
    from (eval):6:in `call_times'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:285:in `block in run_warmup'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:268:in `each'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:268:in `run_warmup'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:253:in `block in run'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:252:in `times'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips/job.rb:252:in `run'
    from /usr/local/bundle/gems/benchmark-ips-2.12.0/lib/benchmark/ips.rb:52:in `ips'
    from file.rb:8:in `<main>'
ydakuka commented 1 year ago

The difference as I can think of it is probably due to the difference between a discrete and a continuous ranges. E.g for a (1..5) and 3.5 ‘include?’ would be ‘false’, but ‘cover?’ would be ‘true’.

I've checked it for different ruby versions, no.

2.2.10 :001 > (1..5).include?(3.5)
 => true 
2.2.10 :002 > (1..5).cover?(3.5)
 => true 
2.2.10 :003 > (1..5).member?(3.5)
 => true 
2.7.6 :001 > (1..5).include?(3.5)
 => true 
2.7.6 :002 > (1..5).cover?(3.5)
 => true 
2.7.6 :003 > (1..5).member?(3.5)
 => true 
3.0.6 :001 > (1..5).include?(3.5)
 => true 
3.0.6 :002 > (1..5).cover?(3.5)
 => true 
3.0.6 :003 > (1..5).member?(3.5)
 => true 
3.2.2 :001 > (1..5).include?(3.5)
 => true 
3.2.2 :002 > (1..5).cover?(3.5)
 => true 
3.2.2 :003 > (1..5).member?(3.5)
 => true 
pirj commented 1 year ago

So both cover? and include? work the same? Why the difference in the implementation that causes such a difference in performance?

According to the doc, all three are different https://ruby-doc.org/3.2.2/Range.html#class-Range-label-Methods+for+Comparing with the following difference:

(‘a'..'d').include?('cc') # => false ('a'..'d').cover?('cc') # => true

Can we swap o e for another in our examples? Can we recommend any of those two for ranges given === promises to serve a slightly different purpose?

koic commented 1 year ago

IMO, RuboCop Performance sometimes requires trade-offs between readability and speed, and at times may produce false positives. In the style guide, it's generally better to showcase options that prioritize readability and don't have false positives.

bbatsov commented 1 year ago

Fully agree with @koic.