rspec / rspec-expectations

Provides a readable API to express expected outcomes of a code example
https://rspec.info
MIT License
1.26k stars 397 forks source link

match_array is taking too long to finish #1161

Open RicardoTrindade opened 4 years ago

RicardoTrindade commented 4 years ago

Subject of the issue

Was writing some tests in my project, was using match_array to make some expectations and noticed that it causing some tests to never finish.

Your environment

Steps to reproduce

Try and run this test locally and see if it finishes (it will most likely fail nonetheless)

    it do
      a = Array.new(50) { rand(1...9) }
      b = Array.new(50) { rand(1...9) }
      expect(a).to match_array(b)
    end

Expected behavior

The test should finish running (regardless of pass/fail)

Actual behavior

The test never finishes or takes too long to finish (with smaller arrays)

pirj commented 4 years ago

Kind of relates to https://github.com/rspec/rspec-expectations/issues/685 and https://github.com/rspec/rspec-expectations/issues/577 Do you think with those things in mind there a good algorithm that would handle matching in linear time?

diei commented 3 years ago

I ran into same problem. I use Ruby 2.7.1 and rspec-expectations 3.10.1. Even arrays with only 30 items takes much too long to finish.

pirj commented 3 years ago

Would you like to make an analysis of the time complexity of the algorithm, @diei? Does the fact that it fails or matches affect the time to complete? Let's dissect this. Here is the source to get you going https://github.com/rspec/rspec-expectations/blob/43bf64b01f8356979ffbc373b2e81d2ab1389b29/lib/rspec/matchers/built_in/contain_exactly.rb#L83

diei commented 3 years ago

I think the complexity is factorial.

best_solution_for_pairing is called actuals size times: https://github.com/rspec/rspec-expectations/blob/43bf64b01f8356979ffbc373b2e81d2ab1389b29/lib/rspec/matchers/built_in/contain_exactly.rb#L236

And in best_solution_for_pairing a new PairingsMaximizer is created and find_best_solution is called with the actuals and expected arrays reduced by one item. https://github.com/rspec/rspec-expectations/blob/43bf64b01f8356979ffbc373b2e81d2ab1389b29/lib/rspec/matchers/built_in/contain_exactly.rb#L288

In best case match_array finishes in milliseconds, even with arrays having 1000 items.

pirj commented 3 years ago

So it seems that sorting doesn't really help, as e.g. there's not always a correlation between sorting and matching match_array(/ab/, /bc/, /cd/)? Would it be possible to understand when we're comparing literals so sorting would always help (i.e. if a < b < c means that a === b is false)? Or can we check elements one by one, e.g. if the first didn't match - skip the rest?

There might be an algorithm out there in CS even for a complex case where sorting won't help, I'm just personally unaware which one would fit best.

diei commented 3 years ago

I'm sorry, but I have not the capacity to dig into this as deep as needed.

bclayman-sq commented 2 years ago

@pirj I think your intuition is exactly right. I've put out a small proof of concept PR to explore this idea and how it might work in code!