rubocop / rubocop-performance

An extension of RuboCop focused on code performance checks.
https://docs.rubocop.org/rubocop-performance
MIT License
677 stars 79 forks source link

Support for Ruby 2.4 casecmp? method #100

Open alejandrovelez7 opened 4 years ago

alejandrovelez7 commented 4 years ago

Is your feature request related to a problem? Please describe.

The suggestion for casecmp can be updated to use the newer Ruby 2.4 method casecmp? instead of casecmp.zero? which requires the user to know the contract for the casecmp method. The ? variation returns truthy or falsy values and is more succinct.

Example: str.casecmp('ABC').zero? vs. str.casecmp?('ABC')

Describe the solution you'd like

Would it be possible to include casecmp? as a potential Good alternative in the casecmp cop?

Describe alternatives you've considered

Removing the .zero? variation completely might not be preferable if we are supporting versions of Ruby before 2.4.

Additional context

N/A

bbatsov commented 4 years ago

Great idea! I guess we'll have to update this cop.

koic commented 4 years ago

I have also considered this method. However this is a performance cop, it is doubtful that casecmp? Is a good case. https://gist.github.com/koic/bb85f03a534edffc6a52fcdc66e47568

casecmp? is still slower than casecmp.zero? in Ruby 2.7.

# bench.rb
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report('upcase')   { 'String'.upcase == 'string' }
  x.report('downcase') { 'String'.downcase == 'string' }
  x.report('casecmp')  { 'String'.casecmp('string').zero? }
  x.report('casecmp?') { 'String'.casecmp?('string') }
  x.compare!
end
% ruby -v
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-darwin17]
% ruby bench.rb
Warming up --------------------------------------
              upcase   200.689k i/100ms
            downcase   197.741k i/100ms
             casecmp   225.196k i/100ms
            casecmp?   164.628k i/100ms
Calculating -------------------------------------
              upcase      4.208M (± 7.5%) i/s -     21.072M in   5.038656s
            downcase      4.329M (± 5.1%) i/s -     21.752M in   5.039575s
             casecmp      5.091M (± 3.7%) i/s -     25.447M in   5.005460s
            casecmp?      3.014M (± 5.9%) i/s -     15.146M in   5.047086s

Comparison:
             casecmp:  5090889.2 i/s
            downcase:  4328831.1 i/s - 1.18x  slower
              upcase:  4207647.1 i/s - 1.21x  slower
            casecmp?:  3013803.1 i/s - 1.69x  slower
AlexWayfer commented 4 years ago

I've reported about this cop in 2017: https://github.com/rubocop-hq/rubocop/issues/4277#issuecomment-294291258

Ruby >= 2.4: casecmp doesn't work with Unicode, casecmp? is slower than downcase + ==, so downcase + == is good compromise.

AlexWayfer commented 4 years ago

And there are no actions in fast-ruby, sadly: https://github.com/JuanitoFatas/fast-ruby/pull/160

zverok commented 4 years ago

Funnily enough, downcase is just plain wrong in edge cases (and that's why casecmp? is slower, probably):

str = "Straße"
str2 = str.upcase
# => "STRASSE" 
str.downcase == str2.downcase
# => false 
# ...because...
[str.downcase, str2.downcase]
# => ["straße", "strasse"] 
# ...so, you might want to...
[str.downcase(:fold), str2.downcase(:fold)]
# => ["strasse", "strasse"]
# ...but you can just
str.casecmp?(str2)
# => true 

So, it seems that:

  1. this cop shouldn't be in rubocop-performance (the three alternatives aren't equivalent at all)
  2. but suggesting casecmp? instead of comparing downcased is a good candidate for main Rubocop's Lint/ department.
AlexWayfer commented 4 years ago

@zverok good case. Can we use upcase and == instead of downcase? It should work for your examples and be faster than casecmp?.

zverok commented 4 years ago

@AlexWayfer counter-example would be EXACTLY the same :) German's "lowercase sharp s" ("ß") generally uppercased as "SS", but "uppercase sharp s" still exists (used for avoiding ambiguity in uppercase passport data, for example). So, peculiarly...

"STRASSE" == "STRAẞE"
# => false
"STRASSE".upcase == "STRAẞE".upcase
# => false 
"STRASSE".casecmp?("STRAẞE")
# => true 

Generally, case folding is a "right" thing to do for case-insensitive comparison, and it is complicated, and there is no "simple shortcut" around it, that's why we need it.

AlexWayfer commented 4 years ago

@AlexWayfer counter-example would be EXACTLY the same :) German's "lowercase sharp s" ("ß") generally uppercased as "SS", but "uppercase sharp s" still exists (used for avoiding ambiguity in uppercase passport data, for example). So, peculiarly...

Wow, it's interesting. These chars look similar in GitHub, but not in my terminal:

image

Thank you. Then, if Ruby (and other languages, I've checked) works so, let's move this cop as suggested by @zverok.