`junit-testkit` usability feedback

mkobit commented 5 years ago

Thanks for providing the junit-testkit module, it has made testing extensions more straightforward and supported. I have been documenting some of my usability concerns with it while it is still in the experimental phase. These can most likely be split into multiple issues to decide what to do, but I thought I would put them here in aggregate to see how the team and other users feel about them.

For each section, I tried to demonstrate the problem with a code sample and offer up proposal idea for how I would like to see it behave. Take the proposals as an initial idea or to help frame the discussion around if my feedback is warranted or not.

`assertStatistics()` failure output

The failure output for assertStatistics() hasn't been useful, and the feature kind of feels like a trap. Asserting on events produces more helpful output than when using the statistics.

Demonstration

The example below let's say that one of the assertions failed:

results.tests().assertStatistics {
    it.failed(0)
    it.succeeded(1)
}

The failure output is then:

Expected :0
Actual   :1
 <Click to see difference>

Expected :1
Actual   :0

Which doesn't help when debugging.

We have found ourselves instead just using the assertThatEvents() chaining as the events contain more information about why it succeeded/failed.

results.tests()
    .failed()
    .assertThatEvents()
    .isEmpty()

results.tests()
    .succeeded()
    .assertThatEvents()
    .hasSize(1)

Proposal

For us, we have tried to remove any usage of assertStatistics and only use the events as demonstrated above.

AssertJ `Condition`s are cumbersome

junit-testkit is tightly coupled to AssertJ, so the best way to build additional assertions is to use Condition. For example, there are already EventConditions and TestExecutionResultConditions that comprise a good chunk of the conditions you may use. However, chaining Condition together can be cumbersome and also creating new ones can be a bit ugly.

Throwable Demonstration

results.tests()
    .failed()
    .assertThatEvents()
    .hasSize(1)
    .haveExactly(
        1,
        finishedWithFailure(
            cause(
                allOf(
                    message("can't create configuration"),
                    instanceOf(RuntimeException::class.java)
                )
            )
        )
    )

I find using nested Conditions tough. This feels closer to Hamcrest territory, and it misses out on all of the built-in assertions and output that AssertJ already provides.

Report Entry Demonstration

Here is an example for testing for ReportEntry and the published values. There aren't built-in Conditions for ReportEntry yet in junit-testkit, so we have a few of our own.

import org.assertj.core.api.Condition
import org.junit.platform.engine.reporting.ReportEntry
import org.junit.platform.testkit.engine.Event
import org.junit.platform.testkit.engine.Event.byPayload
import org.junit.platform.testkit.engine.Event.byType
import org.junit.platform.testkit.engine.EventType

fun reportEntryPublished(): Condition<Event> {
    return Condition(
        byType(EventType.REPORTING_ENTRY_PUBLISHED),
        "report entry published event"
    )
}

fun reportEntryPublished(condition: Condition<ReportEntry>): Condition<Event> {
    return Condition(
        byPayload<ReportEntry>(ReportEntry::class.java) { condition.matches(it) },
        "report entry published event with result where %s",
        condition
    )
}

fun keyValuePairs(condition: Condition<Map<String, String>>): Condition<ReportEntry> {
    return Condition(
        Predicate { condition.matches(it.keyValuePairs) },
        "key value pairs where %s",
        condition
    )
}

fun hasEntry(key: String, value: String): Condition<Map<String, String>> {
    return Condition(Predicate { it[key] == value }, "has entry (%s, %s)", key, value)
}

Using those Condition implementations on EngineExecutionResults:


results.tests()
    .reportingEntryPublished()
    .assertThatEvents()
    .haveExactly(1, reportEntryPublished(keyValuePairs(hasEntry("thing_happened", expectedValue))))

I think this shows a few things:

Condition types can be difficult to write
Nested conditions can be tough to use
No ability to use built-in AssertJ assertions like MapAssert.hasKey
No (I don't see one) clear path to using other assertion libraries (use .list() or .stream() to get Events, but you still need to filter and combine them with Condition)

Proposal

My proposal would be to try and make most of the types in junit-testkit agnostic of assertion library and also add stronger typing. This will enable users to use whatever assertion library they want, reducing size of needed APIs (I think), and not having to provide custom assertion entrypoints like the assertStatistics() above.

For example, instead of having enum EventType and class Event, they could be combined for stronger typing - like ReportingEntryPublished

Then, AssertJ might look like:

assertThat(result.tests().events())
    .filteredOn { it is ReportingEntryPublished }
    .extracting<ReportingEntryPublished> { it as ReportingEntryPublished }
    .extracting<Map<String, String>> { it.keyValuePairs }

Possibly with some built-in helpers. Or, if AssertJ improves in this area (or I'm missing something), then users can use that version of AssertJ.

Or, when other assertion libraries like strikt/Kotlin are used:

expectThat(result)
  .get { tests() }
  .get { events() }
  .filter { it is ReportingEntryPublished }
  .map { it.keyValuePairs }

`EngineExecutionResults` method chaining

I find the chaining to be a bit misleading and confusing at times as the method chaining is not strongly typed. The meaning when doing this is unclear

Demonstration

results.tests().containers().tests().failed().success().all()

results.tests().failed().failed().all()

Proposal

In the same theme as the proposal above, is to make junit-teskkit agnostic of assertion library and make results more strongly typed.

Mocking dependencies in extensions

We often find ourselves wanting to "inject" behavior into our extensions to flex different behaviors in both them and the tests that would consume them. This comes up especially when our extensions are used with external tools that we don't necessarily want to actually use during testing of that extension. This is a little tough to do today due to the test engine constructs those extensions.

Demonstration

One of the incredibly ugly ways we have been doing this today is serializing/deserializing behavior we want by using configurationParameter(String key, String value). This works for us, so ¯\_(ツ)_/¯.

We have also started to experiment using @RegisterExtension and providing some factory that retrieves the "mocked" instance/behavior we want.

Proposal

I don't have anything concrete here, more just food-for-thought, and open to suggestions for how to make this better.

marcphilipp commented 5 years ago

@mkobit Thanks for providing such detailed feedback! We'll take a closer look once 5.5 is released.

jbduncan commented 5 years ago

One advantage of decoupling the testkit from AssertJ for me is that I could continue to just use Truth, which is my assertion of library of choice and what I'm using in https://github.com/jbduncan/jupiter-collection-testers. :)

kriegfrj commented 5 years ago

I've been using testkit as well. I haven't used assertStatistics(), but I second @mkobit's concerns about using the Conditions. I am an AssertJ fan, but I've never been a big fan of conditions - in particular their diagnostic output is not as friendly as some of the other AssertJ assertions. An alternative (or a supplement) would be to come up with custom assertion classes for the various JUnit Platform interfaces that are used by engines - probably TestIdentifer, TestPlan, UniqueId, TestDescriptor, TestExecutionResult and ReportingEntry would be the most important ones. The engine testkit could then simply generate & return a TestPlan as the result and you've got most of the infrastructure you need to do whatever custom assertions you'd like to do (in AssertJ, Truth, or something else).

Conditions may have their place, but if they are kept I'd like the following to be considered:

One of the things that often caught me out was the fact that the built-in conditions like container() would match anywhere in the test's UniqueId string. This caused me endless hours of debugging frustration, wondering why this would fail:

.haveExactly(1, container("my.container.name"), finishedSuccessfully())

The issue was that the clause container("my.container.name") would match not only the container itself, but all of the individual tests that ran inside the container. This made it difficult for me to assert that the named container was only executed once.

On the other hand, I can see that it would also be useful sometimes to assert that a certain test ran inside a particular container, so the current implementation facilitates the following:

.haveExactly(1, container("my.container.name"), test("testName"), finishedSuccessfully())

...which asserts not only that the test testName ran, but that it ran within the container named my.container.name and not in some other container.

If we're going to stick with conditions, I would propose deprecating the ambiguous "container()" with two that are less ambiguous, eg:

inContainer(x) - asserts that matching test identifers are sub-tests of the given container named x. isContainer(x) - asserts that the matching test identifiers are themselves containers with the name x (ie, the name is compared against the value of the last segment of the unique ID).

TestEngine sanity checking

Another area where I think that testkit could be improved is in the execution recorder, which could be extended to make it a compliance testing tool.

A TestEngine implementation is responsible for invoking callbacks on the EngineExecutionListener that is handed to it. There are certain rules that the implementation is supposed to follow when it does so (which are documented in EngineExecutionListener's Javadoc) - for example:

executionStarted() must be called on a TestDescriptor before it executionFinished().
executionStarted() must be called on a TestDescriptor before it is called on its parent.
Probably shouldn't call executionStarted() twice on the same node.
dynamicTestRegistered() probably shouldn't be called on a TestDescriptor that is already known to the engine/test plan.
If a node has executionSkipped() called, then it shouldn't have executionStarted() or executionFinished() called, and none of its descendants should have any events called either.

The perfect place to do this sanity checking would be in ExecutionRecorder. to check that none of these conditions have been violated and to throw (or store) an assertion if so. For example:

    /**
     * Record an {@link Event} for a container or test that was skipped.
     */
    @Override
    public void executionSkipped(TestDescriptor testDescriptor, String reason) {
        skippedTests.put(testDescriptor.getUniqueId(), testDescriptor);
        // More sanity checking goes here
        this.events.add(Event.executionSkipped(testDescriptor, reason));
    }

    /**
     * Record an {@link Event} for a container or test that started.
     */
    @Override
    public void executionStarted(TestDescriptor testDescriptor) {
        UniqueId us = testDescriptor.getUniqueId();
        if (startedTests.containsKey(us)) {
            violations.add(new AssertionError(String.format("Test %s has already been started", us)));
        }
        if (skippedTests.containsKey(us)) {
            violations.add(new AssertionError(String.format("Test %s has already been skipped", us)));
        }
        if (testDescriptor.getParent().isPresent()) {
            UniqueId parent = testDescriptor.getParent().get().getUniqueId();
            if (!startedTests.containsKey(parent)) {
                violations.add(new AssertionError(String.format("Test %s started before its parent %s", us, parent)));
            }
        }
        startedTests.put(testDescriptor.getUniqueId(), testDescriptor);
        this.events.add(Event.executionStarted(testDescriptor));
    }

With not too much extra tweaking, you could also record the stacktrace and arguments passed in for each legitimate invocation and store it in the map, so that when a duplicate invocation occurs your assertion's error message can refer the tester back to the place where original invocation was made and its arguments (like what Mockito does). Multiple violations could be aggregated into a MultipleFailuresError and thrown at the end of the test.

All of this functionality would be extremely useful for debugging TestEngine implementations and guaranteeing compliance.

marcphilipp commented 5 years ago

Thanks for the feedback! I‘m on the road right now so I cannot try this out, but I‘m pretty sure that container() checks for TestDescriptor.isContainer(). Are you using Type.CONTAINER_AND_TEST in your engine?

kriegfrj commented 5 years ago

Thanks for the feedback! I‘m on the road right now so I cannot try this out, but I‘m pretty sure that container() checks for TestDescriptor.isContainer(). Are you using Type.CONTAINER_AND_TEST in your engine?

edited: first paragraph of my response got merged into the comment itself due to lack of newline

I'm terribly sorry, I just checked the testkit source and you are right - container() does call TestDescriptor.isContainer(). I must have remembered incorrectly and described my problem back-to-front - ie, I was trying to write test conditions of the form event(container("my.test.Clazz"), test("myTestMethod")) and wondering why it wasn't matching. So I mixed it up. Which, in an ironic sort of way, further serves to underscore my point about the inherent ambiguity of "container" as the name of a condition. :smile:

I remember now my difficulty was related to a slightly different problem - the fact that container(String) uses the uniqueIdSubstring(String) as the substring. This test matches anywhere in the entire uniqueId string, when I was expecting it only to match on the last segment of the uniqueId. My engine (see https://github.com/bndtools/bnd/blob/4357d04c68a9a39d5a1cc3c3c3bee2065dd3ce7b/biz.aQute.tester.junit-platform/src/aQute/tester/bundle/engine/BundleEngine.java) was designed to discover tests in OSGi bundles inside a running OSGi container. Each bundle was represented in the test hierarchy as a container, and the classes within the bundle were also containers and descendent nodes of the bundle node. So assuming I had one bundle with symbolic name "bundle.symbolic.name", and the bundle had two tests in it my.pkg.TestClass1 and my.pkg.TestClass2, the testkit debug output looked something like this (the uniqueIds of each TestDescriptor listed):

1. [engine:bnd-bundle-engine]/[bundle:bundle.symbolic.name]
2. [engine:bnd-bundle-engine]/[bundle:bundle.symbolic.name]/[class:my.pkg.TestClass1]
3. [engine:bnd-bundle-engine]/[bundle:bundle.symbolic.name]/[class:my.pkg.TestClass1]/[test:myTestMethod]
4. [engine:bnd-bundle-engine]/[bundle:bundle.symbolic.name]/[class:my.pkg.TestClass2]

Now, in order to verify my engine was doing the right thing, I wanted to assert that there was exactly one TestDescriptor generated for the bundle with BSN "bundle.symbolic.name", which I naively tried to do as follows:

haveExactly(1, event(container("bundle.symbolic.name"), finishedSuccessfully()))

The problem is that container(String) is comprised of two tests: 1. that the node is a container, and 2. that the node's uniqueId contains the specified string. This means that the the haveExactly() I crafted above actually matched all of the containers in the bundle as well as the bundle container itself (ie, lines 2 & 4 as well as 1) because they all have "bundle.symbolic.name" in their uniqueId. This was a "gotcha" that took me a while to figure out.

To my mind, intuitively I expected the "string" parameter passed to container(String) should match against the last segment of the uniqueId only, and not against the whole uniqueId string. Same applies to test(String) and others. I think it's a bit counterintuitive the way it is currently implemented - after all, the "is a container" test doesn't return "true" if the leaf node is a container or any of its parent nodes are. I ended up writing a custom predicate lastSegmentMatches() to do this, but it meant that container(String) was of limited usefulness out-of-the-box.

Maybe that was just me, but I'd have thought my intuitive understanding would be what most would have expected?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. Thank you for your contribution.

jbduncan commented 3 years ago

I think this feedback is still worth acting upon.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. Thank you for your contribution.

stale[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. If you have a good use case for this feature, please feel free to reopen the issue.

jbduncan commented 2 years ago

This feedback is still worth acting upon, in my opinion. JUnit team, would you mind re-opening this issue?

junit-team / junit5