data driven test - Githubissues

Max-Goebel commented 9 years ago

There are many situations where you want to test one piece of code with several different data. For that purpose, you need

test code that invokes the code under test and compares its results with predefined expected results.
test data, consisting of input and expected results.

Of course, it would be a bad test design to duplicate the test code for each record of test. With a clean test design you would separate the test code from the test data and invoke the first one for each record. There are different solutions for this design approach:

Soultion 1 Write a driver method (this.execute) that invokes the code under test and compares its results with the expected results. Write a test method that invokes the driver method for each record.

@test
Public void testMethod() {
    …
    this.execute(input1, expOutput1);
    this.execute (input2, expOutput2);
    this.execute (input3, expOutput3);
    …
}

Pro: Debugging is easy, because you can set a breakpoint at any record.

Contra:

Duplicated code for invoking the driver method (better would be a loop).
Bad test reporting. If one record failed, the whole test method failed.

Solution 2 With Serenity, you can separate test code from test data by using @RunWith(SerenityParameterizedRunner.class) and @TestData (see http://thucydides.info/docs/serenity-staging/#_data_driven_tests ).

Pro:

No duplicated code
Slow running tests can be executed concurrently
Probably there will be a meaningful report (one test result for each data record). This is only an assumption, because I Could not find a statement in the serenity documentation about this)

Contra: Debugging is difficult, because there cannot be set a breakpoint to a special data record.

Solution 3: Move the test data in an external CSV file (see http://thucydides.info/docs/serenity-staging/#_using_test_data_from_csv_files ).

Pro:

No duplicated code
Probably slow running tests can be executed concurrently (could not find a statement about it in the serenity documentation)
There will be a meaningful report(one test result for each data record)
Customer can read and understand the CSV file and may provide own CSV test data

Contra:

Debugging is difficult, because there cannot be set a breakpoint to a special data record. Furthermore, the developer has to switch between the java code and the csv file when debugging.

The purpose of this ticket is to decide:

Are there more solution alternatives we should consider?
Which should be recommended?

Max-Goebel commented 9 years ago

I vote for solution 3.

hohwille commented 9 years ago

Solution 2 is IMHO only suitable for situations where you want to run a test-case with many test-methods in different technical setups (e.g. with spring and without spring). For running with different data there are more cons than pros - esp. debugging and error reporting is painful.

In easy situations solution 1 is fine, but when it comes to more complex data then 3 is IMHO better.

Max-Goebel commented 9 years ago

When using solution 3 (csv file), it is important to have only few columns in the csv file. Consequently, the csv file should contain only data that determines the behavior under test. There should be no data in the csv file, that is needed for running the test but does not influence the specific behavior under test (e.g. required fields that must not be null but theirs value doesn't matter). Such unimportant data should be initialized with dummy values in the java code.

We should have an example for solution 3 that implements these recommendations. Maybe there is a proper pattern.

maybeec commented 9 years ago

:+1: for solution 2

Especially in agile development context, maintaining semi-structured documents like csv files for test data management can easily end up in a black whole for efforts. (This may be different e.g. for planned acceptance tests testing a predefined delivery state).

We already used such solution in our internally developed Tk-Unit test framework, which more and more loses its importance due to bad maintainability.

Max-Goebel commented 9 years ago

@May-bee: Do you mean Proven? We should distinguish between two variants of external-file-driven tests: key-word-driven test and data-driven-test. The first one has control statements like "click button X" in the csv file. With the second approach, csv files contain data only. The java test code keeps all control what to do with the data.

Are your bad experiences caused by key-word-driven test or by data-driven-test?

maybeec commented 9 years ago

I meant data-driven-tests. In the tool mentioned above we declare the filling of large data structures in excel sheets with the original approach of resulting in maintenance improvements due to the fact, that the filling can be understood more easier especially also for customers if they want to maintain the software later on by themselves.

The bad about this is, that during development you will end up in even more maintenance efforts, when e.g. your data model changes. You cannot easily perform mass updates on several csv or binary excel files in a smart manner and you have a strong technology break. The latter will slow down test maintenance as there is no code assist or link such that you have to understand the framework filling the input data and then find the right sheet documenting the input object structures.

maybeec commented 9 years ago

But that is my opinion, maybe there are different ones also. @maihacke or @ahoerning, anybody? I think this is an important topic.

maihacke commented 9 years ago

I think data driven tests is something that only works in very simple use cases. In real world you often have complex data structures for input data and expected results. Trying to externalize this data in csv, xls etc often leads to high maintaince efforts. In my projects there were very few cases where ddt was benetifial. If tests are developed by developers and not business people, I think ddt, bdd and so no good approaches. You should structure you tests well and use a good infrastructure to actually create testdata (builder pattern for example). But you should not hide was actually is happening in complex technical frameworks, dsls and so on.

Max-Goebel commented 9 years ago

@may-bee: How would you perform a mass-update with solution 2? I think this is more difficult than in CSV, because with CSV you may use Excel for refactorings. If refactoring-support is important, all data should be created in the code by constructor arguments or by setter methods. This may be realized best with solution 1.

maybeec commented 9 years ago

As I have seen in serenity you easily can create the test data in code (@TestData annotated methods). Creating your test data on code level e.g. by using a builder pattern, you can easily adapt and maintain changes on database level at a single point: the builder.

In my understanding refactorings are somehow automatic and globally. I do not see how this can be assured by formats like csv files.

hohwille commented 9 years ago

I think you are talking with different views. There are scenarios that can be refactored easier with a builder pattern approach (e.g. default values or type conversions) while there are other scenarios that can be refactored easier with CSV data (e.g. bulk replace in a specific column).

hohwille commented 9 years ago

In general I have the impression that we are discussing too much on abstract levels in the area of testing. If we create real examples by implementing such test-cases for our sample it would be a lot easier to talk together and come to conclusions. Otherwise we are also tempted to picture the perfect solution with only pros and no cons that unfortunately never exists.

aburghcap commented 9 years ago

Coding test data is as coding logic: following the DRY principle helps to keep the result maintainable. With any non-trivial table of data we soon will run into doubling entries. Builders allow to abstract from similarities and keep an eye on the differences in various test cases.

I would suggest to start with builders. @may-bee has shown a nice pattern for them. From there any necessary data format - database inserts, XML strings, csv files could be generated. That way the impact of changed requirements can be limited to a minimum of required changes in the builders and generators.

aburghcap commented 9 years ago

Regarding the contra point to solution 1: The problem of failing for different test method calls can be solved with http://joel-costigliola.github.io/assertj/assertj-core-features-highlight.html#soft-assertions

hohwille commented 9 years ago

@aburghcap thanks for the hint :+1:

Max-Goebel commented 9 years ago

It seems, that soft-assertions can solve the problem with incomplete error messages in solution 1. But still the whole test method will be reported as "failed", even if just one data record was failed.

hohwille commented 9 years ago

It seems, that soft-assertions can solve the problem with incomplete error messages in solution 1. But still the whole test method will be reported as "failed", even if just one data record was failed.

Completely true. The same applies for parameterized tests: https://github.com/junit-team/junit/wiki/Parameterized-tests

Instead you should create multiple test methods:

@Test
public void testMethod1() {
    doTest(input1, expOutput1);
}
@Test
public void testMethod2() {
    doTest(input2, expOutput2);
}
@Test
public void testMethod3() {
    doTest(input3, expOutput3);
}
    …

Max-Goebel commented 9 years ago

As far as I know, the Parameterized runner reports one result for each data record. As described at https://github.com/junit-team/junit/wiki/Parameterized-tests, you even can define a name for each record in the report:

@Parameters(name = "{index}: fib({0})={1}") ... In the example given above, the Parameterized runner creates names like [1: fib(3)=2].

Of course, one test method for each data record also is a solution, but increases lines of code.

By the way: What about combining the builder pattern with solution 3 (csv)? The builder could be fed with data from csv. The csv contains test-relevant fields only, default values can be set by the builder.

@hohwille: You are right, we should have an example. But do we have a suitable functionality in the restaurant application? I didn't find one.

oasp / oasp4j

data driven test #207