rspec / rspec-support

Common code needed by the other RSpec gems. Not intended for direct use.
https://rspec.info
MIT License
95 stars 103 forks source link

Fixes diff when fuzzy finder `anything` is used in a Hash object (proof of concept) #596

Closed KarlHeitmann closed 1 day ago

KarlHeitmann commented 2 months ago

Hi

I'd like to give you some context about this PR.

TL; DR

See #551 , it is annoying to see the noise added by anything matcher on diffs and I wanted to fix that. I initially thought the solution was trivial, but I quickly found out I was wrong when I began reading the Diff class: there is a lot happening under the hood. I know there are some limitations to the Differ, and I understand there is a lot to change. This PR is a proposal, I don't know if my solution is too hacky, and I'd like to receive some feedback in order to solve this problem.

Before:

@@ -1,4 +1,4 @@
-:an_key => anything,
+:an_key => "dummy",
 :fixed => "fixed",
-:nested => {:anything_key=>anything, :name=>"foo", :trigger=>"wrong"},
+:nested => {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :name=>"foo", :trigger=>"trigger"},

After:

@@ -1,4 +1,4 @@
 :an_key => "dummy",
 :fixed => "fixed",
-:nested => {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :name=>"foo", :trigger=>"wrong"},
+:nested => {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :name=>"foo", :trigger=>"trigger"},

Context

I am currently working on a RoR project that uses Jbuilder to render JSON data that I am delivering on the endpoints of my API.

I am testing my .jbuilder files with type: view RSpec tests, and I have a lot of .json fixture files I am loading and using on my test files as expected values.

The problem this PR addresses.

My plan is to use these fixture files to kill two birds with one stone. On one side, I am comparing those fixture files with the JSON rendered by the .jbuilder files, and on the other side I am using those fixture files as mocks to share with the frontend developers, so they will have the data necessary to write their React application.

The plan is: if somebody in the backend changes a jbuilder, the tests will complain and I will remember to change the corresponding JSON fixture. This way, I will always have the JSON fixtures used by the frontend developers in sync with the backend rendered files. Frontend developers often need to work with data they don't know how to create on their local machines (because they do know React or Vue, but don't know the Rails framework)

In order to achieve the results of my plan, I ~ needed ~ to scrub some fields of my JSON fixture files using the anything fuzzy matcher. An example of fields I often need to scrub are the id, created_at and updated_at values.

I quickly discovered this annoying issue described on #551 : whenever I missed something that should make the test fail and render the diff in STDOUT, I found out the key-value pairs in my expected var with a anything matcher will add noise to the diff message, obfuscating which key actually differs from the expectation.

My approach.

My first idea was: before the place in the source code that renders the diff string between actual and expected vars, I can add a stage where I'll search every key-value pair that has an anything object as value in the expected variable, and replace the anything value with the value found on the actual var I am testing.

Hopefully the commits of this PR will show you what I mean. First commit consists in an implementation that will work with a Hash with no nested hashes, second commit will use recursivity to extend the solution to a hash object with nested hashes.

My reasons.

When I started writing the code to implement my idea, I've got a lot of questions: Can I just mutate the expected var? Shouldn't I get a notice message by the diff remembering that I've used the anything fuzzy matcher in my test? can I just copy the actual value in the expected var? Am I lying when I am doing that?

My answer to these questions was: If I decide to use the fuzzy matcher anything on a key-value pair in a hash object in my test, it implies I do not care about seeing the anything in my diff . Because it is a wild card, because the anything fuzzy matcher should morph into the real value on my actual variable.

I know this solution is hacky.... but it works! It gets rid of the noise generated by the anything fuzzy matcher. I tried to use the super_diff gem, but it has the same problem.

I am sure the code I wrote can be improved and be more concise (as surely could be the text on this PR, lol), and I thought about extending the fix to cover Array objects. But I wanted to know your opinion about this issue before continuing with this. Can I use a recursive function this way? What impact on performance will have my lines of code in this extra stage I am adding? As Aristotle said: every scientific investigation should begin by gathering the opinion of the wise people (ie, the people that earlier on thought and worked on the problem I will begin to work right now).

I always liked the "diff" concept. And I am willing to continue collaborating on enhancing the diff tool. I've seen @mcmire wrote this guide about RSpec, but I have not digged on the article yet. I am doing baby steps. Every feedback is appreciated.

Best.

KarlHeitmann commented 2 months ago

I've submitted 3 new commits to get rid of the CyclomaticComplexity, PerceivedComplexity & MethodLength cops. Each commit fixes one offense. The commits can be squashed into one commit, but since there are maaany ways to solve these offenses, these are only proposals.

KarlHeitmann commented 2 months ago

Latest commit is to fix a problem when using ruby 2.4, 2.3, 2.2. It substitutes prepend method of Array class by unshift. Because prior ruby 2.4 it was not defined.

I can convert that last commit into a fixup to amend: e4ece3df8e411ea4dab6eb08e02269ccf9c3fb14

KarlHeitmann commented 2 months ago

Build is still failing before commit f9a9e9d.

image

I don't know how to reproduce these build environment. I don't know what ruby-head stands for neither I understand the error there. And I don't know how to reproduce the 1.8.7 neither REE (what does this mean?) ruby versions in my computer to dig in and understand why it is failing.

However, I copied and pasted the ruby strings in an irb session and printed them to STDOUT using puts:

image

And found out on version 1.8.7 and REE the keys were not in alphabetical order. I know rspec-support uses pretty print to make the diff of the objects, and my hypothesis is on ruby version 1.8.7, pretty print maybe was not working correctly with some ruby hashes, and its output may be somewhat random.

The only solution I thought of was to cheat and rearrange my tests so the keys are mimicking the output. Hopefully tests will pass now in version 1.8.7 and REE.

Any other ideas are welcome to make those ruby builds pass.

pirj commented 2 months ago

Please accept my apologies for not reviewing this swiftly.

May I ask you to add the output of some spec how it looked before and after this change?

Don’t worry about ruby-head, it is very typical for it to fail. We mostly use it a bit ahead of time to fix issues with upcoming Ruby releases.

Thanks for the effort you’re putting into this.

pirj commented 2 months ago

To me it’s high time to soft-deprecate Ruby 1.8.7. If you feel that fixing it is not something you’d love to dive i to, please make sure no existing spec fail, and just make your new examples pending with a note “broken on old rubies for unknown reasons”.

KarlHeitmann commented 1 month ago

Hey @pirj ! Thanks for reaching out! No worries, it is just my hobby try to fix simple things in github. I have on my bucket list to make a tiny contribution to an important project (like this one). But I understand people may be busier than me :rofl:

I made this gist with the output of these 3 specs BEFORE the changes here, and this gist with the output AFTER the changes. I hope this clarifies what I intend to do with my change.

My intention is whenever you want to diff a Hash with another Hash, if the expected hash contains an anything fuzzy matcher, that anything value will morph into the value of the actual variable, in order to reduce the noise generated by the diff if another key-value pair has a mismatch. As described here #551

The three examples I wrote on my gists shows you one example of two hash comparisons with NO nested hashes, and the other two shows you what happens if there is a nested hash. Related to your first comment, if you think it is too unsafe to perform this recursively, I can tweak my PR so this will work only with plain hashes. The first commit of this PR implements this behavior ONLY on hash comparisons without nested hashes.

KarlHeitmann commented 1 month ago

P.S.1: About support for ruby 1.8.7, I saw you've used if String.method_defined?(:encoding) ... else ... end block to define methods for ruby 1.8.7 and the other ruby versions dynamically. Maybe I can give it a try to fix the ruby build problem.

P.S.2: I noticed something bad while using the debugger... after I mutate the hash on the expected anything value here, when PC returns to the it scope, the expected variable has mutated! As you can see on the screenshot below: image Line 611 has mutated the expected[:an_key] var from anything to dummy. I didn't notice that previously, I think this needs to be addressed.

pirj commented 1 month ago

I recall something has changed in 1.9 or 2.0 related to the stability of keys order on hashes. This may cause the ordering issue.

Please correct me if I’m wrong, but this seems to only affect the order of pair in the output diff, and won’t cause any other behavioural difference between rspec runnin on 1.8.7 and later. If so, I suggest disregarding this problem, and marking the spec as broken on 1.8.7.

KarlHeitmann commented 1 month ago

Yes @pirj , it only affects the order of pair in the output diff. I'll mark the spec as broken on 1.8.7

KarlHeitmann commented 1 month ago

Ok! I think I've solved the issues now. Please tell me if there is anything else I can do. I can also squash all the commits in one or two, but I think it is easier to review the code changes when each commit addresses a code suggestion change.

KarlHeitmann commented 1 month ago

Hey @pirj , what if we just stick with the first commit and improve it with your great suggestion here?

The first commit is a simpler solution that implements the behavior you mentioned here, and I think that behavior although it is simpler is still a good improvement on rspec-support differ. RSpec::Support::Differ already displays nested object diffs in a single line, so the enhancement provided by my suggestion on this PR are marginal in such cases, it may not be worth to implement them.

I recall having seen the broken tests you mentioned while I was working on my last commits, but I am unable to reproduce the broken tests locally. Maybe it is better to simplify the PR, I can even open a new & cleaner PR with the first commit enhanced by your suggestion.

pirj commented 1 month ago

Completely agree, @KarlHeitmann. This pr will be kept if we decide to also include diffing for lines.

KarlHeitmann commented 1 month ago

Ready, I've just submitted #599 with the simplified version of this PR.

JonRowe commented 1 week ago

:wave: Do you want to rebase this taking #599 into account? If you'd prefer to close this thats also fine

KarlHeitmann commented 3 days ago

Hey @JonRowe !

To be honest, I don't know what to do here...

I couldn't rebase the branch, because the merged #599 PR turned out to be completely different to what I initially wrote on the first commit of this branch.

Nevertheless, on my local machine I completely rewrote the current branch taking #599 and had success. The changes I made are simpler than what I did on my initial approach on the current PR #596 .

I could force-push my local branch into my repo so you can see the changes. But I don't know if it is worth to do. Because the diff message for nested hashes is already confusing. I mean, if I made the changes to hide any value that matches the anything fuzzy matcher on nested hashes.... it will not make a significant difference, because I would still see a biiiiiiiig single line diff for a hash object anyways 🤷. I don't know if I'm making myself understood correctly.

OMHO, I think I can close this PR (or keep it "on hold") until RSpec::Support::Differ class can handle diffing nested hashes in a readable and pretty way. Once that task is done, continue with the current PR to hide anything matches on nested hashes.

But I think it is your call, I can force push my local branch here or close this PR. Also, I can try to fix RSpec::Support::Differ to display nested hashes diff in a prettier way.

pirj commented 3 days ago

@KarlHeitmann what is your plan of improving the diff from the description? The first, non-nested, part is done, right?

As I’m looking at before/after, my preference is to keep things as is. There are certainly cases where diffs can be improved. But any improvement is welcome. And I kindly appreciate your thoughtful and thorough approach!

Please feel free to force-push the rebased branch here.

JonRowe commented 1 day ago

I'm going to close this as is, if / when the differ gets improved over in RSpec Support and you'd like to re-open this as a new improvement then feel free to do so :)