hedgehogqa / fsharp-hedgehog

Release with confidence, state-of-the-art property testing for .NET.
https://hedgehogqa.github.io/fsharp-hedgehog/
Other
271 stars 31 forks source link

Recheck is slow #419

Open AlexeyRaga opened 1 year ago

AlexeyRaga commented 1 year ago

Not sure if it is known or by design, but running a test with recheck is more than 10x slower than having the same test run.

It feels strange knowing that recheck only runs 1 tests while check does a bunch or runs and a bunch of shrinks and it is still faster.

I am takling about figures like:

check: 22 tests, 51 shrink, failed in 0.9+ seconds recheck: 1 test, failed in 9+ seconds

TysonMN commented 1 year ago

Wow! Recheck is supposed to be faster. In particular, recheck is supposed to only run your test once, so any breakpoint in your code is only hit once.

I need more information to understand the problem.

Can you reproduce the behavior with code you can share?

AlexeyRaga commented 1 year ago

It does seem to run once and a breakpoints only hit once, I confirm. But somehow it is much slower to run.

there is this https://github.com/dharmaturtle/fsharp-hedgehog-xunit/issues/7 but it looks like a separate issue.

I will see if I can find time to build an artificial case to reproduce.

dharmaturtle commented 1 year ago

In particular, recheck is supposed to only run your test once, so any breakpoint in your code is only hit once.

Potentially relevant: https://github.com/hedgehogqa/fsharp-hedgehog/issues/420

TysonMN commented 1 year ago

Thanks for the reminder @dharmaturtle.

@AlexeyRaga, does the last line of your CE include return? That is necessary in order to only retest the failed input.

Even so, I don't think this fully explains your experience.

dharmaturtle commented 1 year ago

@AlexeyRaga are you using C# or F#? Additionally, are you using Hedgehog Experimental and/or Hedgehog Xunit?

AlexeyRaga commented 1 year ago

@dharmaturtle I use both. I don't believe that it does (or could) matter though. I do not nave anything in the CE. I add latest versions of all 3: Hedgehog, Hedgehog.Experimental and Hedgehog.Xunit.

Here is an example that reproduces the issue for me, and I found something weird. So I put all the info into this comment, as it is still related. Let's stick with F# only.

A function under test:

module FSModule

let mutable count = 0
let add x y =
    count <- count + 1
    printfn $"Count: {count}"
    let res = x + y
    if res > 50 then x else res

And here is the test:

let ``Test with Property attribute`` (x : int32, y : int32) =
    let res = FSModule.add x y
    res = x + y

When I run it with [<Property>] attribute, it predictably fails and I get the recheck from that failure. For me it now is: "19_11670738321710543822_15170413700225871527_1010101010".

When I add Recheck("19_11670738321710543822_15170413700225871527_1010101010") attribute and re-run the test, I see this in my console:

Count: 1
...
Count: 60

So it confirms that the test runs multiple times.

Now here is where it gets strange.

I add this "Plain" Hedgehog test:

[<Xunit.Fact>]
let ``Test with plain Hedgehog`` () =
    let prop =
        property {
            let! x = Gen.int32 (Range.linearBounded())
            let! y = Gen.int32 (Range.linearBounded())
            return x + y = FSModule.add x y
        }
    prop |> Property.recheckBool "19_11670738321710543822_15170413700225871527_1010101010"

Notice the same recheck data.

And this test passes!. This is totally unexpected to me. But maybe it is a different issue (if it is an issue). Or maybe it is not an issue? Can it be because different bounds were used for int32 in these two cases or something like it?

Anyway, now run this test as prop |> Property.checkBool to grab the failing recheck data. I also notice that the test prints 48 iterations before it fails.

I get the recheck and run it with Property.recheckBool "0_7333683850760263877_13670925995324076957_01010101010101010101010101010101010110110"

Now the test fails printing 338 iterations! So maybe it is an upstream issue! But I am pretty sure that I saw it doing just one iteration before. I am totally confused now.

AlexeyRaga commented 1 year ago

I have created a repo here if it can be helpful:

https://github.com/AlexeyRaga/fsharp-csharp-tests/blob/main/FSharpTest/Tests.fs

TysonMN commented 1 year ago

When I add Recheck("19_11670738321710543822_15170413700225871527_1010101010") attribute and re-run the test, I see this in my console:

Count: 1
...
Count: 60

That is https://github.com/dharmaturtle/fsharp-hedgehog-xunit/issues/7. @dharmaturtle has a fix for it. There will be a new release of the NuGet package soon.

And this test passes!. This is totally unexpected to me. But maybe it is a different issue (if it is an issue). Or maybe it is not an issue? Can it be because different bounds were used for int32 in these two cases or something like it?

Yes, the generators have to be exactly the same, including the bounds. My guess is that this explains this issue.

I get the recheck and run it with Property.recheckBool "0_7333683850760263877_13670925995324076957_01010101010101010101010101010101010110110"

Now the test fails printing 338 iterations! So maybe it is an upstream issue! But I am pretty sure that I saw it doing just one iteration before. I am totally confused now.

Property.recheckBool didn't have efficient rechecking. See https://github.com/hedgehogqa/fsharp-hedgehog/issues/420. This is a limitation imposed by F#.

However, we could minimize the impact of the F# limitation by not exposing functions in our API that encourage users to "run into" this F# limitation.

I am inclined to remove all the Property.*bool* functions.

dharmaturtle commented 1 year ago

However, we could minimize the impact of the F# limitation by not exposing functions in our API that encourage users to "run into" this F# limitation.

I'm in favor. If done, we'll need to be careful though. Tests that return false may "pass" by virtue of not being checked.

JohnEffo commented 1 year ago

This is an issue for me, especially if I combine, choice with list, the test runs fast and the shrink is slow, when I ran into this trying to write demo code the property took so long to recheck I initially thought I'd crashed the IDE, this is not as slow but still noticable.

using System.Collections.ObjectModel;
using Gen = Hedgehog.Linq.Gen;
using Range = Hedgehog.Linq.Range;
using Linq;
public class ChoiceRecheck
{
    [Fact]
    public void Choice()
    {
        var low = Gen.Int32(Range.Constant(0, 5));
        var mid = Gen.Int32(Range.Constant(10, 50));
        var big = Gen.Int32(Range.Constant(100, 200));
        var large = Gen.Int32(Range.Constant(500, 1000));
        var chioce = Gen.Choice(new Collection<Gen<int>> { low, mid, big, large }).List(Range.Constant(100,200));

        var prop = Property.ForAll(chioce).Select(x => x.Any(Test ));
        //prop.Check();
        prop.Recheck("0_8474167946941752373_7006309460388453401_000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000");

    }

    public bool Test(
        int x) => x == 990;
}
TysonMN commented 1 year ago

@JohnEffo, can you open a new issue for that?

That is hedgehogqa/fsharp-hedgehog-xunit#7. @dharmaturtle has a fix for it. There will be a new release of the NuGet package soon.

@dharmaturtle, I was that NuGet package released? If so, what version was it?

dharmaturtle commented 1 year ago

0.5.0

TysonMN commented 1 year ago

@AlexeyRaga, does that version resolve your problem?

AlexeyRaga commented 12 months ago

@TysonMN I am honestly lost by now what workarounds should be applied and in which combination... :(

Should I use or not use return? Am I allowed or not allowed to use checkBool and recheckBool? In which combination with return does it or does not it work? In which case it half-works and in which case it doesn't?

No, this code below still doesn't work as intended and recheckBool still runs tons of iterations and arrives at an entirely different result at the end.

I am using the latest versions of all the Hedgehog and XUnit packages.

let cond() =
    let mutable n = 0
    fun (str : string) ->
        n <- n + 1
        Console.WriteLine $"{n}: {str}"
        str.Contains 'r'

[<Fact>]
let ``My test`` () =
    let cnd = cond()
    property {
        let! str = Gen.string (Range.constant 1 5) Gen.alpha
        return (cnd str)
    }
    |> Property.checkBool
    // |> Property.recheckBool "0_4278704406483328434_14594425247001969359_0000"

I naively hope that there can be the world in which an invalid program would be rejected by the compiler, and an accepted one would work as intended...

I am now very hesitant to use Hedgehog in my team because of this very tricky and unobvious thing that is really hard to communicate across the team members :(

TysonMN commented 12 months ago

I have a longer comment planned in my head, but I can't write it all now. Here is a brief response.

Quoting the OP:

I am takling about figures like:

check: 22 tests, 51 shrink, failed in 0.9+ seconds recheck: 1 test, failed in 9+ seconds

In hindsight, the problem you describe here is identical to what @JohnEffo describes in this comment. I asked @JohnEffo to create a new issue (#432) because I incorrectly thought it was a different issue. Then I fixed that issue in PR #433.

JohnEffo commented 12 months ago

@TysonMN Thanks for looking into this, I have not pulled the latest version and checked it out, I shall do so ASAP and get back to you.

TysonMN commented 12 months ago

Now here is where it gets strange.

I add this "Plain" Hedgehog test:

[<Xunit.Fact>]
let ``Test with plain Hedgehog`` () =
    let prop =
        property {
            let! x = Gen.int32 (Range.linearBounded())
            let! y = Gen.int32 (Range.linearBounded())
            return x + y = FSModule.add x y
        }
    prop |> Property.recheckBool "19_11670738321710543822_15170413700225871527_1010101010"

Notice the same recheck data.

And this test passes!. This is totally unexpected to me. But maybe it is a different issue (if it is an issue). Or maybe it is not an issue? Can it be because different bounds were used for int32 in these two cases or something like it?

Yes, the generators have to be exactly the same, including the bounds. My guess is that this explains this issue.

Now I think I know exactly the problem. You combined the generators monadically (by using let! twice). I expect that @dharmaturtle implemented the behavior in Hedgehog.Xunit applicatively (which in your code would have been to use let! and then and!).

@dharmaturtle, can you confirm that Hedgehog.Xunit implements this behavior applicatively?

@AlexeyRaga, can you try that code after replacing the second let! with and! to see if that matches your expectations (i.e. the test should fail on the same shrunken input)?

TysonMN commented 12 months ago

Property.recheckBool didn't have efficient rechecking. See #420. This is a limitation imposed by F#.

However, we could minimize the impact of the F# limitation by not exposing functions in our API that encourage users to "run into" this F# limitation.

I am inclined to remove all the Property.*bool* functions.

I should be careful when claiming that something is impossible. I was wrong. I still think there is a related limitation imposed by F#, but it doesn't prevent Property.recheckBool from benefitting from efficient rechecking (i.e. only the shrunken input is tested).

PR #437 implements efficient rechecking for Property.recheckBool.

I no longer think there is an efficiency problem with all the Property.*bool* functions, so they can stay.

dharmaturtle commented 12 months ago

@dharmaturtle, can you confirm that Hedgehog.Xunit implements this behavior applicatively?

Yep, done here. (Property.forAll => BindReturn)

dharmaturtle commented 12 months ago

A short explanation of monadic vs appliciative behavior

You combined the generators monadically

Alright let's translate some of TysonMN's FP nerdspeak (a monad is just a monoid in the category of endofunctors what's the problem) to mortalspeak. You can combine generators in two ways: monadically, or applicatively. Let's say you have two gens:

        property {
            let! x = Gen.int32 (Range.linearBounded())
            let! y = Gen.int32 (Range.linearBounded())
            return x + y = FSModule.add x y
        }

The example above is monadic; it uses let! and let!. Which means that x must be calculated before y. In contrast...

        property {
            let! x = Gen.int32 (Range.linearBounded())
            and! y = Gen.int32 (Range.linearBounded())
            return x + y = FSModule.add x y
        }

The example above is applicative; it uses let! and and!. Which means that x and y are calculated simultaneously.

Why does this matter? Applicatives allow for more efficient rechecking (compared to combining monadiclly). Here's an intuitive explanation:

I'm intentionally using language that hints at multithreading to trigger your intuition.

Applicative behavior isn't "better" than monadic behavior. Monadic behavior allows you to declare dependencies between generators, e.g.

        property {
            let! x = Gen.int32 (Range.linearBounded())
            let! y = Gen.constant x
            return x + y = FSModule.add x y
        }

It is not possible to write the above in an applicative manner - if x is shrunk, so must y.

In summary, where possible, choose applicative behavior because it allows for more efficient rechecking. However this is not always possible so you may be forced to choose monadic behavior. It's a tradeoff between power (you can express more with monadic) and rechecking efficiency.

@TysonMN please feel free to tear this apart :)

TysonMN commented 12 months ago

@TysonMN please feel free to tear this apart :)

Very good. Minor quibbles. Here is one correction.

Applicatives allow for more efficient ~rechecking~ shrinking (compared to combining monadiclly).

TysonMN commented 12 months ago

@TysonMN I am honestly lost by now what workarounds should be applied and in which combination... :(

Sorry about that. Thanks for sticking it out though. You are helping to make Hedgehog better! I appreciate it :)

Should I use or not use return?

You should. If you don't, then when rechecking, your test code could be called more than once (instead of only once on the shrunken value). Requiring that every Property CE includes a return statement is still an improvement I have yet to make (or figure out if F# allows me to implement efficient rechecking without a return statement in the Property CE).

Am I allowed or not allowed to use checkBool and recheckBool?

In the current version (which is 0.13.0), recheckBool does not have efficient rechecking. Now that PR #437 has merged, recheckBool does have efficient rechecking. I plan to create a new release soon with this improvement.

In which combination with return does it or does not it work?

Those two issues are independent of each other.

In which case it half-works and in which case it doesn't?

I am unaware of any case in which something half works.

No, this code below still doesn't work as intended and recheckBool still runs tons of iterations and arrives at an entirely different result at the end.

I am using the latest versions of all the Hedgehog and XUnit packages.

Does the code work as intended when using the current version of master (now that PR #437 is merged)?

I naively hope that there can be the world in which an invalid program would be rejected by the compiler, and an accepted one would work as intended...

"Invalid" is a bit strong here. I believe that we are contrasting code that does or does not have efficient rechecking. F# Hedgehog was initially released in 2017. In July of 2021, a user essentially requested (in #332) the feature that I now call efficient rechecking. With great effort, I implemented this feature and merged it in December of 2021 (via PR #336). Like all software, its impact was reduced because of limitations imposed by existing code and by bugs. Since then, this situation has continued to improve. See the change log for a list of these improvements. With your help, this list of improvements is now longer. Thanks for helping to make that possible.

I am now very hesitant to use Hedgehog in my team because of this very tricky and unobvious thing that is really hard to communicate across the team members :(

I don't want Hedgehog to be (very) tricky or unobvious. Please create an issue for each thing that you think is tricky or unobvious.