Fix diff output when a fuzzy finder anything is inside an expected hash

KarlHeitmann commented 1 month ago

This is a simplified version of PR #596 , it solves the issue #551 "Diff reports confusing output when used with "fuzzy" matchers like anything "

This PR will fix only anything values associated to top level keys of a hash. It will not work with nested hashes.

Example output BEFORE changes in this PR:

@@ -1,4 +1,4 @@
-:anything_key => anything,
+:anything_key => "bcdd0399-1cfe-4de1-a481-ca6b17d41ed8",
 :fixed => "fixed",
-:trigger => "wrong",
+:trigger => "trigger",

Example output AFTER changes in this PR:

@@ -1,4 +1,4 @@
 :anything_key => anything,
 :fixed => "fixed",
-:trigger => "wrong",
+:trigger => "trigger",

pirj commented 1 month ago

undefined method `flatten' for #<Hash

in 1.8.7

Looks great otherwise, thank you!

pirj commented 1 month ago

For the elevator pitch, may I kindly as you to add before and after example outputs to the pr description?

KarlHeitmann commented 1 month ago

Ready!

I've just added the before/after example outputs on the PR description.
The error undefined method flatten for #<Hash> is really weird. I am not calling that method anywhere on the _spec.rb file (the file that is complaining), and the application file differ.rb calls flatten on line 135. But I have not touched that line on my branch. I made a workaround to bypass the new code if you are using ruby version 1.8.7. That should be enough to solve the issue...

KarlHeitmann commented 1 month ago

I've just simplified no_procs_and_no_numbers? in a new commit by removing the star operator.

I don't know if that's the best solution, the downside is it modifies the original no_procs? and no_numbers? input argument. Usually I don't like to alter the original code that existed before doing my changes.

Other solutions I could have taken that didn't need to alter the no_procs? and no_numbers? methods are:

Alternative 2: copy paste the logic of no_procs? and no_numbers? and joins the expression with && operator

  def no_procs_and_no_numbers?(*args)
    safely_flatten(args).none? { |a| Proc === a } && safely_flatten(args).none? { |a| Numeric === a }
  end

Alternative 3: an optimization of Alternative 2. This one will have better performance

  def no_procs_and_no_numbers?(*args)
    safely_flatten(args).none? { |a| Proc === a || Numeric === a }
  end

Both alternatives makes all test pass on my local machine.

KarlHeitmann commented 1 month ago

UPDATE: yeah... what you are mentioning here is a good idea: get rid of the safely_flatten by just taking the top-level keys of each of those elements: actual & expected. I am doing some benchmarks because I have not found the complexity of Hash methods like .none? or .any? on the ruby hash documentation or ruby enumerable documentation. I hope I'll have a simple proposal ready in the next days to submit it here :)

KarlHeitmann commented 4 weeks ago

I realized the change is simpler than I thought. I just introduced a new commit that reverts the no_procs_and_no_numbers? method and restores no_procs? and no_numbers? to its original state. This new commit introduces Differ#all_hashes?(actual, expected) private method that will check if actual and expected are Hashes. In such cases, it will call the method that checks the presence of RSpec::Mocks::ArgumentMatchers::AnyArgMatcher fuzzy matcher as value on the expected hash, then will mutate its anything value and will fold that diff on the output string.

I am doing this instead of using safely_flatten because the behavior I'm proposing on this PR is a specific case scenario: the user has given an actual and expected vars that both are Hash instances, and the expected var can contain an anything fuzzy matcher, so the output diff will contain less noise.

safely_flatten was introduced in fc01e3e5b4e711a471dde0dd65d5e528439e9fdf, and before this commit, methods such as def no_procs?(*args) were written like this:

def no_procs?(*args)
  args.flatten.none? { |a| Proc === a }
end

But that method was only called once in that old file. You can check it out with this command:

git show fc01e3e5~:lib/rspec/support/differ.rb

This leads me to think that before using safely_flatten, calling args.flatten inside no_procs?(*args) was just a fancy way to check the actual and expected vars were not Proc objects; instead of using something like this:

def no_procs?(actual, expected)
  (Proc != actual) && (Proc != expected)
end

The fancy way is more readable, and it has the benefit if actual or expected are Array objects, they can be recursively flattened and you could diff_as_object objects nested deep down in actual & expected arrays, as long as these nested objects are not Procs. Pretty cool, huh? The problem with this approach is that you could enter into an infinite recursion, that's why later on safely_flatten method was introduced.

But again, the code changes I'm proposing are narrowed down to a specific case scenario, where the user wants to diff two hashes in which one of them an anything fuzzy matcher is being used as a value of a top-level key of the expected var. No need to search recursively. Recursion should be addressed in my previous PR #596

KarlHeitmann commented 4 weeks ago

I've just submitted another commit, because I noticed another issue: on this line, the expected var mutates. So, if you provide Differ with a expected hash that contains an anything fuzzy matcher as a value in one of its top-level keys, after the differ.diff call the expected var will no longer contain the fuzzy matcher: it will contain the corresponding actual var value.

Hopefully the spec I've added will explain what the issue better than how I am explaining it here. Here is the code snippet:

it "checks the 'expected' var continues having the 'anything' fuzzy matcher, it has not mutated" do
  actual = { :fixed => "fixed", :trigger => "trigger", :anything_key => "bcdd0399-1cfe-4de1-a481-ca6b17d41ed8" }
  expected = { :fixed => "fixed", :trigger => "wrong", :anything_key => anything }
  differ.diff(actual, expected)
  expect(expected).to eq({ :fixed => "fixed", :trigger => "wrong", :anything_key => anything })
end

Here is the output before the application code changes (failing test)

     Failure/Error: DEFAULT_FAILURE_NOTIFIER = lambda { |failure, _opts| raise failure }

       expected: {:anything_key=>anything, :fixed=>"fixed", :trigger=>"wrong"}
            got: {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :fixed=>"fixed", :trigger=>"wrong"}

       (compared using ==)

Here is the output AFTER the application code changes (test succeeds)

pirj commented 4 weeks ago

Nice find. Can this be the root cause of this getting its way to the diff in the first place?

1,4 @@
-:anything_key => anything,
+:anything_key => "bcdd0399-1cfe-4de1-a481-ca6b17d41ed8",

KarlHeitmann commented 3 weeks ago

Yes, the cause of:

1,4 @@
-:anything_key => anything,
+:anything_key => "bcdd0399-1cfe-4de1-a481-ca6b17d41ed8",

is the mutation there: First I take the expected var, I ask it if it is a Hash instance, and if it is a Hash, I'll select its values on the top-level keys that are anything and mutate them into their corresponding actual value. That's the core idea to fold all that noise. On the first PR #596 I applied that to all nested hashes recursively... but that's not very useful because the current differ RSpec::Support::Differ shows a single red line change for any nested hash... it is not very useful and I don't think it is worth doing that (side note: although it is proven possible to do on #596 ).

Before submitting the latest commit c3e9ea926463829a048a131f7b85fef16c186ef5 , I thought the mutation would occur inside the Differ class, but will not "bubble up" to the test scope... but I was wrong, that commit fixes that issue.

KarlHeitmann commented 3 weeks ago

Almost ready @JonRowe . :warning: :warning: :warning: But there is a caveat :warning: :warning: :warning:

The first commit 31f086e addresses your comments in which I could make safe changes.
The second commit dd7a4c7 addresses your comment related to conditionally define my method diff_hashes_as_object or just "alias" it to diff_as_object. But I discovered these changes can break your test suite by throwing a random false positive test failure.

If I run only that test file with the command:

bundle exec rspec ./spec/rspec/support/differ_spec.rb

A failure rises. Because the RSpec.describe Differ sentence :warning: :warning: :warning: will load RSpec::Support::Differ before RSpec::Mocks::ArgumentMatchers::AnyArgMatcher is required on RSpec :warning: :warning: :warning: . So, the test I introduced with the anything matcher will always fail with the command above: because it will define the alias version of my method instead of defining my method that handles anything fuzzy matcher.

But if you run the full test suite with:

bundle exec rspec spec

Chances are all tests will pass, unless differ_spec.rb file IS the 1st file to be run. I've ran the full test suite many many times (20+ and I got bored) and I was always lucky. I've introduced a puts "FLAKEY TEST FUZZY MATCHER RUN" sentence on the flaky test, Below is an example screenshot with 4 trials... but if your CI CD has very bad luck, you may find some false positives.

I don't know how to fix this issue, is there a way to force the test suite to not run differ_spec.rb first? But if we do so, anyone that wants to add something to RSpec::Support::Differ will stumble upon this issue when running rspec ./spec/rspec/support/differ_spec.rb

I've tried to dig in and find where AnyArgMatcher is loaded to try to force differ_spec.rb to load it before the RSpec.describe Differ do sentence. But I didn't had success. the furthest I went is here: rspec-core/lib/rspec/core/dsl.rb#L43 , on that line, I've put a binding.break before the class gets defined. I hope this screenshot can provide some more information:

What should we do about this?

pirj commented 3 weeks ago

What if we quote it?

RSpec.describe "Differ" do

it’s the way we recommend writing specs anyway. Would it solve the flakiness issue?

Alternatively. We autoload classes, and numerous places may trigger AnyArgMatcher to load. If we have no intention to run specs when rspec-mocks are not loaded on purpose, then I don’t see any harm if you require the file from rspec-mocks that defines it explicitly at the bottom of this spec file.

KarlHeitmann commented 3 weeks ago

That's awesome! RSpec.describe "Differ" do fixed the issue! I've run the single test file and the whole test suite and I had no problems at all.

If there are no more changes left to do, I can squash all the commits into a single one.

BTW, What's the difference between RSpec.describe the class and the name of the class as string?

pirj commented 3 weeks ago

Nice! Let me have another quick look.

RSpec.describe the class and the name of the class as string?

This autoloading (this is the first in my practice). And also probably described_class won’t work.

pirj commented 3 weeks ago

uninitialized constant RSpec::Support::HunkGenerator

./spec/rspec/support/differ_spec.rb:169

on 2.7 🤔

KarlHeitmann commented 1 week ago

You're welcome! Is there anything else I can do for you here? @pirj @JonRowe ? I can squash all the commits into a single one if you think this PR is OK.

BTW, in Ruby 2.7 build it is complaining about spec/rspec/support/differ_spec.rb:169

  1) Differ #diff handles any encoding error that occurs with a helpful error message
     Failure/Error:
       expect(RSpec::Support::HunkGenerator).to receive(:new).
         and_raise(Encoding::CompatibilityError)

     NameError:
       uninitialized constant RSpec::Support::HunkGenerator

I don't know how can I reproduce this error on my local machine. I have not touched this line 169 before, is it a flaky test? or is it related to changing:

RSpec.describe Differ do

to

RSpec.describe "Differ" do

?

pirj commented 1 week ago

2.7 passed on re-run. Must be order-dependent load order flakiness. Let’s see if it bugs us again. But I don’t think it should stop from merging.

Thanks again. @JonRowe wdyt?

JonRowe commented 1 week ago

Ugh I hate to say this at this late juncture, but I think @pirj's earlier point about large nested hashes has a point, if our point here is to make the diff smaller by making things equal, and the matcher is matching on anything then why aren't we doing the inverse of this? e.g. Creating a actual_to_diff and setting the ignored keys to "anything"

KarlHeitmann commented 1 week ago

That's thinking outside of the box! I tried your suggestion and it works! I didn't thought about it... but using the actual_to_diff approach is better than using the expected_to_diff approach :+1:

pirj commented 1 week ago

HunkGenerator again, flaky

JonRowe commented 1 week ago

Thanks!

rspec / rspec-support

Fix diff output when a fuzzy finder anything is inside an expected hash #599

./spec/rspec/support/differ_spec.rb:169