Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints

marcphilipp commented 6 years ago

This issue originates from a dicussion in https://github.com/junit-team/junit5/pull/1422#issuecomment-390582297.

While often it suffices to specify each set of arguments for the invocation of a parameterized test explicitly, there are cases where it's simply the combination of lists of potential values for each parameter, i.e. the cartesian product of these collections. While it's certainly already possible to compute these combinations manually, e.g. in a method referenced by @MethodSource, it would be more convenient if there were a declarative way to specify that.

Recently, @sormuras has written a sample CartesianProductProvider extension which does that by implementing TestTemplateInvocationContextProvider. However, by implementing this functionality as a ArgumentsProvider users would not have to learn a new concept and we could reuse the existing infrastructure.

One possibility would be to use fields and methods, like the Theories runner from JUnit 4.

@BaseSet
static final Set<String> strings = Set.of("foo", "bar");

@BaseSet
static Set<Integer> ints() {
    return Set.of(23, 42);
}

@ParameterizedTest
@AllCombinationsSource
void test(String a, int b) {...}

@junit-team/junit-lambda Thoughts?

Deliverables

[ ] Introduce new ArgumentsProvider and related annotations

sbrannen commented 6 years ago

I think that sounds reasonable in terms of a new feature.

The hardest part will be coming up with good names for the annotations. 😉

marcphilipp commented 6 years ago

Maybe, instead of introducing a bunch of new annotations, a simple API to combine sets of arguments from a @MethodSource method would be enough.

Something like

static Collection<Arguments> cartesianProduct() {
    return Arguments.builder(Set.of("foo", "bar"))
        .combineWith(Set.of(23, 42))
        .build();
}

or

static Collection<Arguments> cartesianProduct() {
    return Arguments.cartesianProduct(Set.of("foo", "bar"), Set.of(23, 42));
}

sbrannen commented 6 years ago

True, that may prove more elegant, and it definitely avoids an explosion of annotations (which we should aim to avoid).

So I agree: let's go with a programmatic approach via @MethodSource first and see what people think of that!

sbrannen commented 6 years ago

Though I would have Arguments.cartesianProduct() return a Set instead of a Collection, since it must be a set by definition -- right? 🤔

marcphilipp commented 6 years ago

Well, a Set is a Collection... 😉

While, mathematically speaking, it must be a set, I think we should be more lenient than that. I think we should make it work with arbitrary Collections and just return a Collection or Iterable. So, for instance, if one passes a List including duplicates (according to equals()), we would include the duplicates in the resulting Collection. Does that make sense?

sbrannen commented 6 years ago

Ummmm.... no, not really. 😇

My point was that the framework should actually require/enforce set semantics and avoid duplicated invocations of equivalent products.

Does that make sense? 😉

marcphilipp commented 6 years ago

I think that's debatable because of potentially broken/slow equals implementations... 🤔

sbrannen commented 6 years ago

Hmmmmmmmmmmmmm

sbrannen commented 6 years ago

I can't see your method signature for Arguments.cartesianProduct().

Are you proposing the following?

Collection<Arguments> Arguments.cartesianProduct(Collection<Object>... collections);

... b/c I was thinking of this:

Set<Arguments> Arguments.cartesianProduct(Set<Object>... sets);

marcphilipp commented 6 years ago

I think that's debatable because of potentially broken/slow equals implementations.

I'm of course not referring to built-in Java types, but user-defined ones. E.g. I've often encountered JPA entities where equals() only compared their id field. Or, the best ever equals() implementation I've stumbled upon which unconditionally returned false. Of course, that's nonsense and people shouldn't do that. On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods so I'm not convinced we should be this strict in this case.

I can't see your method signature for Arguments.cartesianProduct().

Almost! I was thinking of this:

Collection<Arguments> Arguments.cartesianProduct(Collection<?>... collections);

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

sbrannen commented 6 years ago

I knew what you meant regarding equals/hashCode. I was just confirming the overall signature. 😉

Oops... yeah... when I typed <Object> I of course meant <?>.

On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods

That's true, but... those duplicates would be returned by user code; whereas, Arguments.cartesianProduct() would be framework code that can be more exact.

In any case, I see your point regarding broken equals/hashCode implementations in user code. Hence, my follow-up comment below.

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

I think that would almost be a requirement to support that even if only as an opt-in feature in the builder.

sbrannen commented 6 years ago

the best ever equals() implementation I've stumbled upon which unconditionally returned false.

That's quite a jewel of software engineering, but... I've seen something that tops it (on another level): a method in the service layer annotated with @Ignore from JUnit 4. I'll just leave it at that. 😇

nrllewellyn commented 6 years ago

A couple of random (hopefully useful) notes from a JUnit user:

The annotation-driven approach does provide some (minor) advantages--it could produce some interesting testing options when mixed with nested tests (e.g. "all tests share parameters from the base class, but this nested class provides extra parameters"). I also feel that the annotation form is slightly more readable than the method-driven. (That said, it's not like someone couldn't just write a new ArgumentsSource to use annotations if they really wanted it--just throwing out a couple of opinions! 😄)
Whatever solution you decide on, I would recommend allowing the user to have multiple parameters of the same type. For example: void testAddTwoNumbers(int first, int second). I find that this is a common use case for reducers and other pieces of code that need to combine two values (or more). This ended up being a limiting factor for @ValueSource in a lot of our testing.
While someone is working in this codebase, you might want to fix the docs for parameterized tests--I believe that Section 3.15 was actually intended to be section 3.14.2

sbrannen commented 6 years ago

I believe that Section 3.15 was actually intended to be section 3.14.2

Good catch!

Fixed in 5343323db975f1f1469ea6e7601205f94dc0ab81.

jhorstmann commented 6 years ago

For the programmatic approach, I would suggest that the method should return a lazy Stream<Arguments> instead of an already materialized collection.

I had a similar requirement some time ago. There was a huge number of input combinations generated by a cartesian product, but only a few of them were actually valid combinations for the test scenarios. An example would be that for the german version of a webshop, only certain payment methods and only a shipping address in germany would be valid.

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first. I can provide an implementation if needed.

sbrannen commented 6 years ago

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

So if we truly implement it solely based on streams instead of building a collection upfront, we would then likely need to keep performance in mind in order to support large data sets.

I can provide an implementation if needed.

@jhorstmann, perhaps just sharing your ideas/intentions would be a good start.

jbduncan commented 6 years ago

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

@sbrannen Guava's Sets.cartesianProduct is relatively lazy by my understanding, so I imagine it wouldn't be impossible to make a (sequential) streams version which is at least as lazy and performant. But I admit that I've not personally read anything about the performance of stream-based Cartesian sets, so I may be speaking nonsense... :)

sbrannen commented 6 years ago

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. 😉

sbrannen commented 6 years ago

Based on internal team discussions, we are currently leaning toward one of the following programmatic APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument tuples.

sbrannen commented 6 years ago

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first.

That sounds like a reasonable argument to me in favor of streaming results: I had not considered that a user might want to filter the Cartesian product tuples before returning them to the Jupiter TestEngine.

sbrannen commented 6 years ago

@jbduncan, regarding possible performance issues, I think this blog is what I was recalling (or similar at least).

Quoting that blog:

this code computes the entire stream of cartesian product values before processing the first value.

I haven't verified the claim of course, but it at least sounds like the author put a lot of consideration into the presented solution.

p.s. FWIW, most solutions on Stack Overflow and similar channels suggest the use of flatMap() but don't mention any issues with performance.

jbduncan commented 6 years ago

@sbrannen Cool, thank you for pointing me in the direction of that blog. I gave it a scan yesterday, and it looks promising to me!

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. :wink:

@sbrannen Yeah, okay. Sorry, I wasn't terribly clear earlier: I was sub-consciously hoping that we could easily derive an implementation like Guava's Sets.cartesianProduct for this issue without needing to use the full-blown spliterator-based solution given in that blog, as I vaguely recalled that Sets.cartesianProduct's javadoc gave a citation for the algorithm its implementation was based on. But looking back at said javadoc, I realise that I was mistaken!

Thus, using the blog you linked earlier sounds like the best way forward to me.

P.S. FYI, when I was scanning through the blog, I actually recognised the JDK bug report that the author opened regarding flatMap(), for I saw a similar one being discussed on the OpenJDK mailing lists not too long ago. It turns out it's actually a duplicate of this bug, which is fixed already but only for Java 10+.

If the JUnit 5 team does use that blog as inspiration for resolving this issue, might it be worth opening another issue to test out the simpler flatMap()-based solution in the future if or when JUnit 5+ depends on Java 10+?

jhorstmann commented 6 years ago

For reference, my implementation is available at https://gist.github.com/jhorstmann/a7aba9947bc4926a75f6de8f69560c6e and actually looks very similar to the blog post, except it uses an iterator which is then wrapped into a spliterator. I can add any license if needed.

The cartesianProduct method returning Stream<Arguments> could already be implemented outside of junit. I'm thinking if we could make it even easier to use. Especially with a larger number of dimensions, mapping those to method arguments by index could be error prone. Especially if you'd want different subsets of dimensions for different test methods. Another approach could be to allow the different Source annotations on method parameters

void test(@CsvFileSource(...) @CsvToPerson Person person,
          @EnumSource(...) PaymentMethod paymentMethod,
          @MethodSource(...) Address shippingAddress,
          @MethodSource(...) Address billingAddress)

If I understand it correctly, the above annotations could also be used as meta-annotations, making the example even nicer. Different test methods could so easily use different sets of dimensions. Filtering of valid combinations could be done inside the method by using Assumptions.

marcphilipp commented 6 years ago

Interesting idea! However, it wouldn't support multiple parameters per source at all. For starters I think we should define the cartesianProduct method in Arguments.

@jhorstmann Would you like to submit a PR with your implementation?

matthieu-vergne commented 5 years ago

Is there some advances on that matter? I am just looking for such a feature, actually.

sbrannen commented 5 years ago

Is there some advances on that matter? I am just looking for such a feature, actually.

Nothing official, but if I recall correctly, @sormuras created a proof of concept.

@sormuras ?

sormuras commented 5 years ago

Aye. The POC wants to be integrated into JUnit Pioneer. See https://github.com/junit-pioneer/junit-pioneer/issues/68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

marcphilipp commented 5 years ago

I think a simple cartesianProduct method would be easy to implement and provide a way to implement such use cases along with @MethodSource without requiring any additional dependencies.

For example, I needed sth. like that here: https://github.com/marcphilipp/nexus-publish-plugin/blob/master/src/test/kotlin/de/marcphilipp/gradle/nexus/NexusPublishPluginTests.kt#L55-L61

matthieu-vergne commented 5 years ago

I have a professional case (sorry I can't share it here) where I have many tests, including a bunch to be parametrized. These parameters build on the same few sets, but they are often different combinations (different size and order) of those sets. Rather than using a different function for each of them, I would rather provide the few annotations each of them needs, with the right sets in the right order. It would be way better to read than a double method every time.

marcphilipp commented 5 years ago

@matthieu-vergne Could you please explain in more detail what you mean with "a double method every time"?

matthieu-vergne commented 5 years ago

The test method + the argument provider method.

rougou commented 4 years ago

Based on internal team discussions, we are currently leaning toward one of the following programmatic APIs:
List<Arguments> Arguments.cartesianProduct(Set<?>... sets);
Or:
Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);
In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument tuples.

@sbrannen I like this approach, but would prefer cartesianProduct(Collection<?> ... collections) as suggested by @marcphilipp previously. The reason being that when writing tests in Groovy, we'd have to do: [false, true] as Set etc. for each parameter. With Collection we won't have to add the as Set every time.

marcphilipp commented 4 years ago

I'd be okay with a more loose definition based on Collection if it doesn't make the implementation more complex.

matthieu-vergne commented 4 years ago

If you can go this way, an Iterable may be enough.

tommai78101 commented 4 years ago

Is this new feature experimental?

https://github.com/junit-team/junit5-samples/tree/master/junit5-jupiter-extensions/src/main/java/com/example/cartesian

I didn't see it in this list:

https://junit.org/junit5/docs/current/user-guide/#api-evolution-experimental-apis

marcphilipp commented 4 years ago

No, that's just a sample project that shows how to accomplish this in an extension. It's not an officially supported feature (yet) since it's not as extensible as we'd like it to be.

sormuras commented 4 years ago

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

tommai78101 commented 4 years ago

@marcphilipp Thanks for the clarification.

scordio commented 3 years ago

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project. Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

It seems it was not yet advertised: @Michael1993 brought the feature to life with junit-pioneer/junit-pioneer#321, and I'm proposing some additions in junit-pioneer/junit-pioneer#409 and junit-pioneer/junit-pioneer#416

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. Thank you for your contribution.

stale[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. If you have a good use case for this feature, please feel free to reopen the issue.

erik35 commented 1 year ago

are there plans to integrate this into JUnit 5 itself?

marcphilipp commented 1 year ago

There are no current plans but we may reconsider if the Pioneer extension gets widely used.

matter-it-does commented 1 year ago

using cartesian test and saved so much complexity

mahozad commented 1 year ago

See this StackOverflow post

nskvortsov commented 3 months ago

@marcphilipp the extension by Pioneer is very nice. I would like to use it with our IDEA/Gradle integration test (test a number of scenarios with a number of Gradle versions), but it does not play well with JUnit5's ValueSources and cannot provide a set of {Gradle versions} x {CSV values}

scordio commented 3 months ago

@nskvortsov when using the Pioneer extension, you're supposed to use the @CartesianTest flavor of sources like @Values, and not the JUnit ones.

For example, from their docs:

@CartesianTest
void myCartesianTestMethod(
        @Values(ints = { 1, 2 }) int x,
        @Values(ints = { 3, 4 }) int y) {
    // passing test code
}

To the best of my knowledge, there is no support for CSV sources yet.

In the past, I raised junit-pioneer/junit-pioneer#416 (targeting the "old" version of the extension) but haven't had the capacity to bring the topic forward. You could chime in with some details about your use case.

junit-team / junit5

Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints #1427

Deliverables