jqwik-team / jqwik

Property-Based Testing on the JUnit Platform
http://jqwik.net
Eclipse Public License 2.0
551 stars 63 forks source link

Right Way to Integrate jqwik with DataFaker #578

Closed akefirad closed 2 weeks ago

akefirad commented 3 weeks ago

Testing Problem

I'm trying to integrate jqwik with DataFaker to generate data not supported by jqwik (e.g. addresses, etc). Previously I did (DataFaker V1):

fun faker() = Arbitraries.randomValue { Faker(it) }.dontShrink().withoutEdgeCases()

Which gave me an arbitrary containing the faker object and from there I could generate any stream of data supported by DataFaker. After upgrading to DataFaker V2, things got significantly slower, on CICD pipelines more than 10x. The only thing I can think of is some changes in Faker(Random) which made it slower than V1. Now I'm not sure which library I need to reach out.

I tried to cache faker object, but after debugging it looks like there's way too many random objects which makes the cache ineffective.

Suggested Solution

🤷‍♂️

Discussion

Just out of curiosity, why does jqwik have to generate so many random objects? And also I noticed that XORShiftRandom doesn't not pass its seed to its super class Random. Is this a bug or feature?

jlink commented 3 weeks ago

I'm unfamiliar with DataFaker so I can't answer why it became slower in collaboration with jqwik. You may want to try one of the alternative approaches, though:

  1. Map over random values Arbitraries.randoms().map(r -> Faker(Random)....) But maybe the problem is that Faker(Random) is expensive and the faker object should be cached?
  2. Use jqwik's data-driven properties for data created by DataFaker. See https://jqwik.net/docs/current/user-guide.html#data-driven-properties

If you provide a bit more details about how you combine DataFaker and jqwik, maybe I have a few additional ideas about how this could be tackled.

From a general point of view, integrating it the other way round - DataFaker using jqwik arbitraries for its object generation - is probably the better approach. Jqwik's lifecycle is probably more involved, given that shrinking, edge cases and exhaustive generation bring additional requirements. But that means, that the creators of DataFaker would be willing to do that - or at least create an abstraction over the actual random creation process so that jqwik could be plugged-in there. But I'm just speculating here...

jlink commented 3 weeks ago

I noticed that XORShiftRandom doesn't not pass its seed to its super class Random. Is this a bug or feature?

As for XORShiftRandom, it's a different algorithm that needs access to the seed and the Random supertype does not provide the access.

jlink commented 3 weeks ago

Just out of curiosity, why does jqwik have to generate so many random objects?

Random objects as opposed to prefabricated ones? Having a lot of randomness is one of the core ideas of Property-based testing. If that idea provides sufficient value is up to debate (and SE research). The final jury is still out, but my impression from what I've read and seen in the field is that the value of randomization a) strongly depends on the context and the breadth of the domain in which you're generating data b) shows up after quite a while, when the test suite has been running for weeks (or months or years) and suddenly stumbles upon a failing scenario that it has never seen before.

akefirad commented 3 weeks ago

Thanks for the detailed answers.

Map over random values Arbitraries.randoms().map(r -> Faker(Random)....)

I'm not optimistic this would help. Since the issue most definitely is somewhere in Faker(Random) (I took a quick Look but didn't find anything, will reach out to DataFaker!). As for caching, it won't help since the number of Random objects are too many and makes the cache ineffective.

If you provide a bit more details about how you combine DataFaker and jqwik...

I mainly use it to create other arbitraries. For example let's say I have a very complex Customer class which contains name, email, and address, etc. I have an CustomerArbitrary (extending ArbitraryDecorator<Customer>) which uses Arbitrary to generate names, emails, addresses, etc. (I can provide the code snippet if that helps.)

From a general point of view, integrating it the other way round...

I think that makes sense. Native compatibility definitely helps here. Let me discuss this with DataFaker people.

akefirad commented 3 weeks ago

As for XORShiftRandom, it's a different algorithm that needs access to the seed and the Random...

Right, that makes sense. My question was, why don' you pass the seed in the constructor of the super class. I'm guessing this has never been a bug because (from JDocs of next(int nbits):

Generates the next pseudorandom number. Subclasses should override this, as this is used by all other methods.

akefirad commented 3 weeks ago

Random objects as opposed to prefabricated ones?

Not really. My surprise was that (when I used Arbitraries.randomValue { Faker(it) }) that jqwik create way too many Random instances with different seeds (not random data, the Random object itself, it in the above) during a single test suite execution. That's what makes it cache unfriendly. Now here I'm way outside my domain of expertise, but if I were asked, I'd have told it should be possible to do the same thing with a single seed. I'm not sure how changing the seed improve the thoroughness of the test, maybe there's something I'm missing. An example is Kotest which uses a single seed for the whole execution. (Admittedly a different library, but still same concept.)

vlsi commented 3 weeks ago

@akefirad , have you profiled the slow execution? I guess running the test with async-profiler would clarify the reasons for the slowness. Could you share a testcase that reproduces the slowness?

akefirad commented 3 weeks ago

I created a clean project with a very simple test. I can't reproduce the slowness. Not sure. The simple project is not comparable to my code since I've got complex objects (there's not so much complex logic, just many properties and many method calls to faker. I'll try to profile and find more. Will update you.

jlink commented 3 weeks ago

My surprise was that (when I used Arbitraries.randomValue { Faker(it) }) that jqwik create way too many Random instances with different seeds (not random data, the Random object itself, it in the above) during a single test suite execution.

Each try will create its own instance of Random. This is necessary to make each try repeatable in a deterministic way: by creating a random with the same seed.

akefirad commented 3 weeks ago

Managed to reproduce it, and also found the root cause of the slowness (the why is still unknown to me). Interestingly I couldn't reproduce this in a clean project. So below is a part of our real project (almost everything is removed, so it's small enough to take a quick look 🙏) demo-jqwik-datafaker-v2.zip

~As for the root cause, below you can see that I obtain phone numbers using a direct sampling (since I really didn't care about it in terms of property-based testing). Interestingly if you remove the direct sampling it performs like V1:~

data class TestCustomerArb(
    private val addresses: AddressArb = AddressArb(),
) : ArbitraryDecorator<TestCustomer>() {
    override fun arbitrary(): Arbitrary<TestCustomer> {
        val name = FakerArb.compositeNames()
        val email = FakerArb.emails()
        val phones = FakerArb.phoneNumbers().injectNull(0.5)  // <- This is fast in DataFaker V1, but not in V2.

        return Combinators.combine(name, email, addresses)
            .`as` { n, e, a -> TestCustomer(n, e, a, phones.sample()) }
    }
}

UPDATE 1: Not too fast! 😬 The issue is not sampling, but the method itself. For some reason it generates phone-numbers more slowly in V2 when it has injectNull. ~Need to reach out to them for more information. Most probably this is an issue with DF. If you wanna close this feel free to do. Otherwise I'll update this issue with my findings. LMK.~

UPDATE 2: Ugh injectNull is jqwik 🤦‍♂️. OK, can you please take a look.

jlink commented 3 weeks ago

// <- This is fast in DataFaker V2, but not in V2.

I assume you mean fast in V1, but not in V2. Any change of jqwik version you did at the same time?

akefirad commented 3 weeks ago

I assume you mean fast in V1, but not in V2

Right, typo! Fixed.

Any change of jqwik version you did at the same time?

Not really. With the demo I've attached you can reproduce it by just swapping the DF versions.

jlink commented 2 weeks ago

Hm. Not everything is getting slower with V2. To me it looks like that generating Customer without the injectNull() is cocnsiderable faster with V2 than with V1.

jlink commented 2 weeks ago

I could further simplify your example and still seeing a difference of factor 10 between V1.6.0 and V2.2.2. I also checked if creation of Faker object is the problem. But creating a Faker instance does not take a relevant amount of time. The puzzle continues...

jlink commented 2 weeks ago

In version 2.2.2 90% of the tests time is spent in DF's net.datafaker.internal.helper.COWMap.put(..):

Bildschirmfoto 2024-06-13 um 14 46 12

This class doesn't even show up when running the same test on 1.6.0

jlink commented 2 weeks ago

I could dive one layer deeper. It has to do with the random object handed into the Faker constructor. In version 2 of DF this random object will be used to calculate a hash value (through some detour of RandomService) for the COWMap above. If a new instance of random is handed in in each generation step the COWMap will grow and get quadratically slower. This is why adding/removing injectNull(..) makes such a difference.

In DF version 1 objects were not stored in COWMap so this did not happen. IMO it's dubious to cache objects based on a random, but only the maintainers of DF can decide if they have other options, since they introduced the caching for a reason.

There's a workaround I can suggest, though:

object FakerArb {

    var random: Random? = null

    fun faker(): Arbitrary<Faker> {
        val randomValue = Arbitraries.randomValue {
            if (random == null) {
                random = Random(it.nextLong())
            }
            Faker(random)
        }
        val dontShrink = randomValue.dontShrink()
        return dontShrink.withoutEdgeCases()
    }

This uses only the first injected Random object as a seed provider for a cached Random object. This keeps deterministically repeatable property runs (as long as there is no concurrent invocation), but individual tries cannot be repeated by jqwik. If this is something that bothers you, I cannot tell. Additionally, I'd also change FakerArb into a class, so that the shared random instance is not shared across all arbitraries, but only for a single one.

There might be another workaround: Create a new subclass of Random, which allows to change the seed underneath, taking the seed from jqwik's Random instance through it.nextLong().

I guess the only clean solution would be if DF allowed jqwik's arbitraries to be used for the randomization functionality. If they are willing to go that route I'd be happy to help out with my jqwik knowledge.

jlink commented 2 weeks ago

Closing for now.

@akefirad Feel free to re-open if you want to clarify more details.

vlsi commented 2 weeks ago

@jlink , have you verified if overriding equals/hashCode in XORShiftRandom would make it any better?

jlink commented 2 weeks ago

Haven't checked. Would I want a random object that always compares true to any other?

vlsi commented 2 weeks ago

Haven't checked. Would I want a random object that always compares true to any other?

I mean use the internal state for equals/hashCode. Then Random instances with the same internal state would be equal, and DF might reuse its cache.

I agree it is weird to use Random as a cache key provided the default java.util.Random uses identity for equals, however, it might work if jqwik used state for equals.

On the other hand, as the sate changes, the cache in DF would expire, so I am not sure it would make any difference.

jlink commented 2 weeks ago

The internal state must change from call to call, otherwise the faker would always produce the same data.

vlsi commented 2 weeks ago

Yeah, having a mutable key is bad in any case, so it looks like DF2's caching based on Random looks indeed odd.

akefirad commented 2 weeks ago

Thank you both. I'll reach out to DF team then.