hedgehogqa / fsharp-hedgehog

Release with confidence, state-of-the-art property testing for .NET.
https://hedgehogqa.github.io/fsharp-hedgehog/
Other
271 stars 31 forks source link

Recheck slow when using choice combined with list #432

Closed JohnEffo closed 12 months ago

JohnEffo commented 1 year ago

While trying to create a toy example of using Hedgehog to generate test cases to exercise a FSM, in this case a very simple web journey I noticed that running a re-check with the provided seed was slow (around 6-10 seconds) the code below simulates the problem but the wait is more like a couple of seconds:

using System.Collections.ObjectModel;
using Gen = Hedgehog.Linq.Gen;
using Range = Hedgehog.Linq.Range;
using Linq;
public class ChoiceRecheck
{
    [Fact]
    public void Choice()
    {
        var low = Gen.Int32(Range.Constant(0, 5));
        var mid = Gen.Int32(Range.Constant(10, 50));
        var big = Gen.Int32(Range.Constant(100, 200));
        var large = Gen.Int32(Range.Constant(500, 1000));
        var chioce = Gen.Choice(new Collection<Gen<int>> { low, mid, big, large }).List(Range.Constant(100,200));

        var prop = Property.ForAll(chioce).Select(x => x.Any(Test ));
        //prop.Check();
        prop.Recheck("0_8474167946941752373_7006309460388453401_000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000");

    }

    public bool Test(
        int x) => x == 990;
}

Initial posted in #419 as problem seemed similar.

Note: In the original linked problem , the generator originally had a constant range (as in the example above) on the list with a range around 100..400. With this setting the recheck was even slower. Since moving to an Linear range the speed is better. The difference is that now the list can shrink down to a much smaller size, where as before the minimum size was length 100, so I think that the size of the shrunk result may have an impact on the speed.

JohnEffo commented 1 year ago

Can confirm that when I change the above generator to

 var chioce = Gen.Choice(new Collection<Gen<int>> { low, mid, big, large }).List(Range.LinearInt32(1,300)).Resize(150);

I not only get a far better shrunken example but the recheck runs a lot more quickly. So this is really only an issue when the shrunken value would be expected to be large.

TysonMN commented 1 year ago

In the new line of code, you changed 200 to 300 and added a call to Resize.

Did you mean to make both changes? I was expecting to see one change.

JohnEffo commented 1 year ago

Sorry the 200 was meant to stay the same, it's the lower bound change (100 => 0 is intentional) combined with the Linear range which is important as it means we don't have such a large list when shrunk.

TysonMN commented 1 year ago

A generator with a large sample space will typically take a long time to shrink.

Is there a reason why you expect the shrinking for this one to be faster?

JohnEffo commented 1 year ago

Using the LinearRange is faster (even with the mistakenly pasted higher upper value), and I think it is faster because;

Visually you can tell less work is being done; here is the recheck from the original line

//Recheck from   var chioce = Gen.Choice(new Collection<Gen<int>> { low, mid, big, large }).List(Range.Constant(100,200));
prop.Recheck("0_8474167946941752373_7006309460388453401_000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000");
// Originally from 1 test and 150 odd  shrinks

As opposed to

/// Recheck from   var chioce = Gen.Choice(new Collection<Gen<int>> { low, mid, big, large }).List(Range.LinearInt32(1,300)).Resize(150);
prop.Recheck("0_13894515488252835539_14133286922830653955_00000000");
// Originally 1 test an 9 ish shrinks

The link in the top comment takes you to a file with a better more realistic example, in the above example the speed difference moves from fast; to starting to be irritating. In the more realistic example the correct generator still takes about 8-10 seconds to re-check, where as using the constant list generator takes around 60+ seconds.

For context the linked example is creating a list of commands to exercise a web interaction (a simple check out), I wanted the list to be large to start of with as I'm throwing lots of the commands away because at and point in the checkout process the generated command may be invalid. I realized once I posted the issue that the long shrunken list size was because I was using a constant range and changed to the linear range with large resize value so as to start of with large list. Doing this improved both the shrunken example and recheck speed.

TysonMN commented 1 year ago

...I was using a constant range and changed to the linear range with large resize value so as to start of with large list. Doing this improved both the shrunken example and recheck speed.

That is exactly as intended.

I see no evidence of a bug and weak evidence of a performance issue.

What behavior did you expect?

dharmaturtle commented 1 year ago

It is confusing why recheck is almost 10x slower than check; shouldn't recheck be doing less work? image

This is especially painful because people only use recheck to focus on a test... which makes it slower.

NCrunch says lines 179 and 180 in followShrinkPath are hotspots:

image

image

It's not obvious to me what the problem/fix is,

JohnEffo commented 1 year ago

That is exactly as intended.

You are correct in this case using a Linear shrink has solved the issue.

I see no evidence of a bug and weak evidence of a performance issue.

Correct as long as we always expect users to be able to shrink the result to a small array/object, if we do note and they are using a generator which is more complicated that the ones presented above then, we don't have a bug but there is a performance issue.

What behavior did you expect?

I think one of these would be desirable:

"Rechecking re-runs the property for the failing example Hedgehog has found, in most case property execution is really fast but in some circumstances, it may take a short while before your property is executed, while hedgehog calculates the test data from the re-check string. This usually occurs if the shrunken value is large; often indicated by a large recheck string. This is a known issue and we are looking into it"

TysonMN commented 1 year ago
  • Not to have a recheck value which can mean over a minute wait before you hit your test code

That is not possible. Not matter how fast our code is, a user can always make a generator so complicated that recheck will take over a minute.

I not sure about your other two bullet points.

Instead, I want to focus on what @dharmaturtle found.

It is confusing why recheck is almost 10x slower than check; shouldn't recheck be doing less work?

Definitely. This a performance issue.

I think the only thing that recheck does that check doesn't is parse the recheck string. Otherwise, check

@dharmaturtle, can you share a branch and instructions to reproduce the slow recheck performance?

dharmaturtle commented 1 year ago

https://github.com/hedgehogqa/fsharp-hedgehog/compare/master...dharmaturtle:fsharp-hedgehog:recheckPerf

A basic Stopwatch supports NCrunch's observation that followShrinkPath is slow.

image

TysonMN commented 1 year ago

@dharmaturtle and I think we found the reason for the worse performance. Consider this line https://github.com/hedgehogqa/fsharp-hedgehog/blob/253634bdcf0a64f0b7e61c54d6301a240d50da0f/src/Hedgehog/Property.fs#L178

The call to Seq.toList is doing too much work. If there are a lot of children, then recheck is doing work to enumerate them all while check might only look at one child. I will change the containing recursive function skipPassedChild to a loop.

TysonMN commented 1 year ago

@dharmaturtle, can you test this branch?

The first, second, and fourth commits change check to either be more efficient, more readable, or both, which should widen the performance difference between check and recheck. The third commit is where all the action is. I decided to change the serialization of the shrink path as I mentioned to you. This greatly simplified the code since otherwise I was expecting to be working at the level of Enumerator<>.

dharmaturtle commented 1 year ago

Hm, I disagree that this commit is obviously more efficient. Yes, you delay the construction of Failed, but you create a function/object/closure(?) instead. Assuming F# records are persistent, then creating them will be lightweight anyway. On my branch, NCrunch thinks this line takes 0ms.

image


I added a RecheckIsFasterThanCheck test, and it sometimes fails. Making a RecheckIsFasterThanCheckLoop test to capture more data yields this:

good! check: 1084. recheck: 1001
BAD!! check: 968. recheck: 992
BAD!! check: 880. recheck: 926
good! check: 1155. recheck: 908
good! check: 944. recheck: 904
good! check: 984. recheck: 859
good! check: 915. recheck: 913
good! check: 894. recheck: 865
BAD!! check: 893. recheck: 908
BAD!! check: 933. recheck: 1013
BAD!! check: 954. recheck: 1197
good! check: 1152. recheck: 1050
BAD!! check: 873. recheck: 892
BAD!! check: 867. recheck: 1044
good! check: 1031. recheck: 944
good! check: 1191. recheck: 925
good! check: 881. recheck: 863
BAD!! check: 936. recheck: 960
BAD!! check: 870. recheck: 879
BAD!! check: 926. recheck: 995
good! check: 877. recheck: 867
good! check: 883. recheck: 851
good! check: 897. recheck: 841
good! check: 902. recheck: 865
BAD!! check: 881. recheck: 908

So recheck isn't always faster than check, but it's significantly better than before.

NCrunch thinks Seq.tryHead and Seq.tryFind are the hotspots.

image

image

I eta-expanded tryFind's arg to show it isn't the functions inside of it that are slow (seems like NCrunch can't profile >> on a per-expression basis). Also, for this test, tryFind always returns the first element anyways (idx 0). Finally, we are always skipping 0.

image

My first thought was that this was due to reversing the shrinkPath (you gotta iterate the whole list to get to the head). However, this is on Node's children.... so I've no idea.

I'm not sure if we should try to eliminate these hotspots; just mentioning them in case it sparks something in your mind. Roughly equal check/recheck perf seems fine.

TysonMN commented 1 year ago

I like the tests. We could either keep the first and assert something like recheckTime < checkTime * 1.25 or keep the second test and assert that recheckTime < checkTime a major of the time. I will add the first since that will be much faster.

When you take screenshots of code, always include the line numbers.

That Seq.tryFind takes a long time is expected. It is in shrinkPath, which is called by ckeck. All of the work to actually construct the tree is in there (because the tree is lazily constructed).

That Seq.tryHead take about the same time is also expected. It is in followShrinkPath, which is called by reckeck. It also does all of the work to actually construct the tree.

They are slow for the same reason, so this branch fixes the performance issue of reckeck being (much) slower than check.

TysonMN commented 1 year ago

@dharmaturtle, can you test this branch?

Stupid me. I should have created a PR in which to have this review.