Add new `Performance/StringBytesize` cop

viralpraxis commented 1 month ago

Before submitting the PR make sure the following are checked:

[x] The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
[x] Wrote good commit messages.
[x] Commit message starts with [Fix #issue-number] (if the related issue exists).
[x] Feature branch is up-to-date with master (if not - rebase it).
[x] Squashed related commits together.
[x] Added tests.
[x] Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
[x] Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

koic commented 1 month ago

I understand the reasoning, but in Rails, replacing 42.bytes.size with 42.bytesize results in NoMethodError. This makes me concerned about its usefulness compared to the risk of false positives.

viralpraxis commented 1 month ago

Yes, that's an issue. But isn't Integer#size less common than Array#size? Could we simply disable autocorrection, or only check for the length and count methods?

viralpraxis commented 1 month ago

Any thoughts? @koic @dvandersluis

I completely understand that we should avoid false positives (if possible), but in this particular case I believe both true and false positives are quite rate; at the same time unnecessary array allocations are painful in case of huge strings. I was inspired by code like this

class BytesizeValidator < ActiveModel::Validations::LengthValidator
  def validate_each(record, attribute, value)
    super(record, attribute, value&.bytes)
  end
end

which was fixed by me to

class BytesizeValidator < ActiveModel::Validations::LengthValidator
  def validate_each(record, attribute, value)
    super(record, attribute, value&.dup&.force_encoding(Encoding::BINARY))
  end
end

to reduce memory consumption

dvandersluis commented 1 month ago

I don't think 42.bytes.size is very likely in a Rails app (because it's essentially calling Integer#size), but it could be avoided in most cases by having this cop not report if the receiver is a numeric literal. It won't cover 100% of cases (ie. variables/methods), but would probably cover most.

It'd be good to have a reference and/or benchmarks for this performance suggestion. Your example doesn't actually make a good case for changing bytes.size to bytesize on its own, just avoiding using the bytes array (but that would apply to any unnecessary use of an array, not specifically to bytes.size).

Could we simply disable autocorrection, or only check for the length and count methods?

I wouldn't change this to only check for length and count, but that wouldn't solve the issue anyways, since all three are used in many contexts. I think that'd just diminish the utility of this cop. BTW - If this PR was to be accepted it'd be ideal for the tests to include all three methods (size/length/count).

koic commented 1 month ago

@dvandersluis's opinion looks good. @viralpraxis Can you proceed with this? However, for safety, it should use Safe: false instead of SafeAutoCorrect: false due to the risk of false positives.

viralpraxis commented 1 month ago

@koic I agree with @dvandersluis too. I will apply the necessary changes.

koic commented 3 weeks ago

Thanks!

rubocop / rubocop-performance

Add new `Performance/StringBytesize` cop #474