Add ability to match parts of data structure for equality

Subject of the issue

I think this is a feature request, as I couldn't find how to do this using rspec (I was, however, able to hack a version of it, that I share at the end of this issue).

I think there's value in being able to use the match operator and composable matchers to achieve a feature that's commonly present in systems that support in pattern matching: asserting whether two parts of a data structure are the same, regardless of their contents. I found a need for this in the context of validating that on a list of recorded messages, all messages reference the same randomly generated ID, but conceptually this is applicable anywhere we care for equality but not for the contents of the data structure.

Here's an example (the important part is the made-up magic_matcher):

require 'rspec'
require 'securerandom'

class Transaction
  def initialize(tracer)
    @id = SecureRandom.uuid
    @tracer = tracer
  end

  def start
    @tracer << { id: @id, event_name: 'start' }
  end

  def end
    @tracer << { id: @id, event_name: 'end' }
  end
end

class Program
  def do_stuff
    tracer = []
    t1 = Transaction.new(tracer)
    t2 = Transaction.new(tracer)
    t3 = Transaction.new(tracer)
    t1.start
    t2.start
    t1.end
    t3.start
    t3.end
    t2.end
    tracer
  end
end

describe Program do
  describe `#.do_stuff` do
    subject { Program.new.do_stuff }
    it {
      is_expected.to match(
        [
          { id: magic_matcher('a label representing t1_id'), event_name: 'start' },
          { id: magic_matcher(:t2_id), event_name: 'start' },
          { id: magic_matcher('a label representing t1_id'), event_name: 'end' },
          { id: magic_matcher(:t3_id), event_name: 'start' },
          { id: magic_matcher(:t3_id), event_name: 'end' },
          { id: magic_matcher(:t2_id), event_name: 'end' }
        ]
      )
    }
  end
end

Conceptually, this is similar to using variables inside pattern matches in some languages (interestingly, Ruby's pattern matching doesn't support this use-case), here it is in elixir:

iex(1)> [1, a, a] = [1, 2, 2]
[1, 2, 2]
iex(2)> [1, a, a] = [1, 3, 3]
[1, 3, 3]
iex(3)> [1, a, a] = [1, 2, 3]
** (MatchError) no match of right hand side value: [1, 2, 3]

Hacked implementation

I was able to hack together something that kind of achieves this, but the implementation requires ugly hacks:

RSpec::Matchers.define :magic_matcher do |match_name|
  match do |actual|
    test = leak_self
    var_name = :@a_unique_var_name_712398
    state = test.instance_variable_get(var_name)
    if state.nil?
      state = {}
      test.instance_variable_set(var_name, state)
    end

    if state.include?(match_name)
      actual == state[match_name]
    else
      state[match_name] = actual
      true
    end
  end

  failure_message do |actual|
    test = leak_self
    var_name = :@a_unique_var_name_712398
    state = test.instance_variable_get(var_name)
    expected = state[match_name]

    "expected that #{actual} would be equal to other instances of #{match_name}, but other instances of #{match_name} contain #{expected}"
  end
end

module TestSelfLeaker
  def leak_self
    self
  end
end

RSpec.configure do |conf|
  conf.include ::TestSelfLeaker
end

Weird cases

Also, it has a few really weird edge-cases:

RSpec.describe 'magic_matcher' do
  it 'works in the simplest cases' do
    expect([1, 2, 2]).to match([a_kind_of(Integer), magic_matcher('a'), magic_matcher('a')])
    expect([1, 3, 2, 2, 3]).to match([a_kind_of(Integer), magic_matcher('b'), magic_matcher('a'), magic_matcher('a'), magic_matcher('b')])
    expect([1, 2, 3]).not_to match([a_kind_of(Integer), magic_matcher('c'), magic_matcher('c')])
  end

  it "doesn't reuse the magic_matchers between different tests" do
    expect([1, 3, 3]).to match([a_kind_of(Integer), magic_matcher('a'), magic_matcher('a')])
  end

  it 'does reuse magic_matchers within the same test, which is confusing' do
    expect([1, 3, 3]).to match([a_kind_of(Integer), magic_matcher('a'), magic_matcher('a')])

    expect([1, 4, 4]).not_to match([a_kind_of(Integer), magic_matcher('a'), magic_matcher('a')])
  end

  it "doesn't chain well" do
    expect([1, 3, 3]).to contain_exactly(a_kind_of(Integer), magic_matcher('a'), magic_matcher('a'))

    expect([1, 4, 4]).not_to contain_exactly(magic_matcher('a'), a_kind_of(Integer), magic_matcher('a'))
  end
end

  it 'works on its own, which may be confusing' do
    expect(1).to magic_matcher('a')
    expect(1).to magic_matcher('a')
    expect(2).not_to magic_matcher('a')
  end

I also can't really tell what would be the preferable behaviour in some of these cases. I think that it shouldn't work on its own outside of a match, and that two matches within the same test should reset the state. Working inside the contain_exactly would be extremely powerful but the implementation would probably be tricky (especially because that matcher's implementation is already complex). Another aspect that I still haven't grokked is the name; magic_matcher isn't exactly descriptive or expressive 😅

Conclusion

So, what do you think? Is there an alternative to achieve similar results? If not, then does this make sense as a feature request?

Thanks for such a nicely done report/feature request! We have some ideas on how to improve issue templates, but it's currently low in the todo list.

Wondering if TestSelfLeaker can be replaced with something existing, e.g.:

RSpec.configure do |rspec|
  rspec.expose_current_running_example_as :example
end

can't tell off the top of my head if example will be available in the matcher though.

Do I follow your idea correctly that this matcher anchors the value on the first match, and matches on subsequent usages with the same label?

'does reuse magic_matchers within the same test, which is confusing'

Sounds like a reasonable behaviour. Some may want to reuse those anchors in the scope of a single example.

"doesn't chain well"

I'm not sure I completely understand, what goes wrong with this example?

'works on its own, which may be confusing'

Confusing for sure. You may take a look at argument matchers' implementation, they don't implement matches?, but do implement ===, and are written as classes. At least it meets your desired behaviour:

expect([1]).to contain_exactly(anything) # works
expect(1).to anything # undefined method `matches?'

I'm delighted by the idea of such a matcher, and if it was mainstream it would for sure find its usages in spec suites. However, at this moment I don't think there's enough demand for it, and I'm more inclined not to accept it to become a first-class citizen.

Still, it's possible to create a custom matcher, and use it in your tests. If you decide to do so, please feel free to summon me for a review or if you get stuck with anything (but as far as I can see you've advanced quite far already!). Just in case, an example of a custom matcher extracted from a monolithic codebase.

Wondering if TestSelfLeaker can be replaced with something existing, e.g.:

Fantastic, this worked great, thanks!

Do I follow your idea correctly that this matcher anchors the value on the first match, and matches on subsequent usages with the same label?

Yes, that's exactly it! Maybe anchor would be a better name? I feel that finding a good name is still one of the biggest challenges with this matcher.

I'm not sure I completely understand, what goes wrong with this example?

Sorry, the example wasn't exactly what I intended it to share. Maybe this is clearer (also switched to anchor):

 it "doesn't chain well" do
   expect([1, 3, 3]).to contain_exactly(a_kind_of(Integer), anchor(:a), anchor(:a))

   expect([1, 4, 4]).not_to contain_exactly(anchor(:b), a_kind_of(Integer), anchor(:b))
 end

The second test should pass, but it doesn't. That happens because 1 gets anchored on b on the first pass of contain_exactly and then the test fails when it tries to anchor 4 to b. If the matcher was implemented within rspec-expectations, we could reset the state of the anchor matcher after the first fail of contain_exactly, and then try an evaluation using a different order, until it worked or we exhausted all possible orders. (I may not be fully grasping how contain_exactly is implemented, though)

You may take a look at argument matchers' implementation

Oh, that's perfect, it works so much better using argument matchers! I posted the new version below.

Sounds like a reasonable behaviour. Some may want to reuse those anchors in the scope of a single example.

I think this is a side-effect of the above: I see these anchors working only within a single call to a top-level matcher, like argument matchers, not within a whole test. I.e. to me, the following test passing feels like the expected behaviour:

it "doesn't reuse anchors across different matches" do
  expect([{id: :random_id1, val: 1}, {id: :random_id1, val: 2}]).to match([{id: anchor(:id), val:1}, {id: achor(:id), val:2}])

  expect([{id: :random_id2, val: 1}, {id: :random_id2, val: 2}]).to match([{id: anchor(:id), val:1}, {id: achor(:id), val:2}])
end

It seems like the perfect solution for this would be for the argument matcher to have access to the chain of matchers that's currently being executed, so that it could navigate it to the root matcher and save the state there. I don't think that possible, though, right?

I'm delighted by the idea of such a matcher, and if it was mainstream it would for sure find its usages in spec suites. However, at this moment I don't think there's enough demand for it, and I'm more inclined not to accept it to become a first-class citizen.

Fair enough. I was convinced this would be essential to anyone taking advantage of match with composable matchers (which is an awesome feature 😍). However, looking at a large codebase (discourse), I found no use-cases for it, as most assertions aren't even done against data structures.

If you decide to do so, please feel free to summon me for a review or if you get stuck with anything

Since I still couldn't solve the problem of combining with contain_exactly, I'm leaning towards not making it into a gem. I'll share here the latest version of the code I have in case someone else is looking for this:

module RSpec
 module Mocks
   module ArgumentMatchers
     def anchor(label)
       Anchor.new(running_example, label)
     end

     class Anchor
       attr_reader :label

       def initialize(running_example, label)
         @running_example = running_example
         @label = label
       end

       def ===(actual)
         var_name = :@a_unique_var_name_712398
         state = @running_example.instance_variable_get(var_name)
         if state.nil?
           state = {}
           @running_example.instance_variable_set(var_name, state)
         end

         if state.include?(label)
           actual == state[label]
         else
           state[label] = actual
           true
         end
       end

       def description
         "equal_to_all_other_values_tagged_with(#{label})"
       end
     end
   end
 end
end

RSpec.configure do |conf|
 conf.expose_current_running_example_as :running_example
end

finding a good name is still one of the biggest challenges with this matcher

Not only for this matcher, but it is also one of the major problems only AI is capable of solving.

If the matcher was implemented within rspec-expectations, we could reset the state of the anchor matcher after the first fail of contain_exactly

Yes, probably, but what if contains_exactly was only part of a bigger expectation, e.g. include(contains_exaclly(...), contains_exactly(...))? Or that wrapping matcher is defined outside of rspec-expectations, in an external matcher library.

That happens because 1 gets anchored on b on the first pass of contain_exactly and then the test fails when it tries to anchor 4 to b

As other argument matchers work, I would expect expect([1, 2]).not_to contain_exactly(be_odd, be_odd) as a whole to pass, even though the first be_odd did not match. Because of the negating the whole expectation with not_to. That's a separate issue from making two expectations.

I can't tell off the top of my head, but probably you could use the call of failure_message as a trigger to reset the state of the anchor matchers for the whole example. Need to check if this method is called just once when to/not_to have figured out that the matcher it was passed as an argument has failed. On the other hand, there's aggregate_failures that can wrap several expectations or the whole example. Not sure when failure_message gets called in this case, might be different.

to me, the following test passing feels like the expected behaviour

I don't have a strong opinion on this, but I would expect anchor to behave like a local variable, and not to be reset in the same scope. Where a statement isn't a scope on its own. Surely, RSpec provides a DSL, and it is very specific and sometimes even hard to understand, but still, if RSpec would have separation/isolation of expectations inside an example as a norm, it would provide a completely different DSL that would enforce that.

argument matcher to have access to the chain of matchers that's currently being executed, so that it could navigate it to the root matcher and save the state there. I don't think that possible, though, right?

Even if it could, it's not a good idea. It's getting late on my end and my sound thinking lets me down, I'll let you make a judgement of your own if the following code would be negatively affected by saving state in the root matcher:

first = be_even
second = first.and(be_positive)
third = first.and(be_zero)
fourth = third.or(eq(nil))

expect(-2).to first
expect(2).to second
expect(0).to third
expect([0, nil]).to all fourth

I find it puzzling to even find what the root matcher is in this case.

as most assertions aren't even done against data structures

Ruby ≈ Rails 🤷‍♂️ I suggest you to check rubocop-ast/parser specs, you might find something interesting there, as they primarily work with data structures.

I still couldn't solve the problem of combining with contain_exactly

Matcher creation from scratch is a tough task, I can't say I've reached mastery level there myself. Still, I'll be happy to help you out with this one. Let's probably switch to https://groups.google.com/forum/#!forum/rspec, or create a dummy repo for the matcher, so we can discuss there, as it's tangential to rspec-expectations.

In elixir what you are doing is supported by assignment / pinned assignment and pattern matching, I'd suggest something like pinned_value(:name) or pin(:name) seeing as we don't have an operator to use. I'd also recommend not using instance variables for this, you could use metadata for this without breaking Ruby norms...

Your initial implementation made very sceptical, but if this could be done less hackily I wouldn't really be opposed to having something like this in rspec-expectations, being able to assert parts of a data structure matches seems useful.

However in Ruby 3 we are getting proper pattern matching, would that not allow you to do what you want without this?

rspec / rspec-expectations