Difficulties unit testing with loops

Irubataru commented 5 years ago

Rubberduck version information Version 2.4.1.4796 OS: Microsoft Windows NT 10.0.17763.0, x64 Host Product: Microsoft Office x64 Host Version: 16.0.11727.20244 Host Executable: EXCEL.EXE

Description Not entirely sure whether this is a feature request, change request, bug or what it is, but it generally concerns an issue I am having when writing tests that use loops to test systematics. First a bit of explanation of how I am used to Assert functioning.

I've mostly been doing unit testing in C++ before this, more specifically using the googletest unit testing suite and they have two very sensible types of tests:

Assert That: These tests check something, and if the check fails it will abort the entire test it is currently running.
Expect That: These tests also checks something, but if they fail, the test will continue to run.

In most scenarios I use the first of these, but it is nice having the distinction; it makes testing cleaner. For example if you are testing the value of a variant you could first assert that it is not empty, and then expect what its value should be gven that it is not empty. In that case it would be easy to write code that doesn't throw errors even when the tests fail.

This choice only really manifests itself in Rubberduck if you are debug stepping through the execution of your test because it only reports the first failure anyway.

This is not really my issue, more of a reason why I encounter it. My real issue is with the existence of a failure category called "Catastrophic failure". When I have searched the Rubberduck source for references it seems to be an easter egg status (?). As far as I can tell it triggers if there are more than 11 failures in a test, but I might be wrong.

This is fine in most situations, but if you are testing with a loop and you encounter this then you are left with no information about what went wrong or in which iteration it happened. The way to find this information is to step through the test code, or, as I do quite frequently, add the following line in my loop

If Result <> Expected Then Exit For

This latter solution is of course just code duplication which has all the issues of copy-pasting code we all know and love.

I have a couple of requests and possible solutions:

Discard the whole Catastrophic Failure state, unless you have a good argumnt for keeping it. I don't really understand why it exists.
Implement Assert vs Expect; this is of course a bit overkill for the specific issue, but maybe you find the idea appealing
Add a function to the Assert object that is .State or some other way for me to see what the state of the test is. In this case I could remove the code duplication by
```
If Assert.State = Failed Then Exit For
```

Testing with loops can be really useful for testing a lot of expected scenarios that follow a certain logic, but with the existense of Catastrophic Failure it is really hard to find out what went wrong so that you can fix it.

retailcoder commented 5 years ago

When I have searched the Rubberduck source for references it seems to be an easter egg status (?). As far as I can tell it triggers if there are more than 11 failures in a test, but I might be wrong.

That is correct... the idea being that a useful unit test should Assert one thing, i.e. have no more than a single reason to fail.

I'm not against discarding it, but I would think implementing data-driven tests (ref. #1229) would be a better approach than using Assert in a loop. Of course without DDT implemented, the "catastrophic failure" easter egg is kind of spoiled. The current-best idea would look something like this (leveraging a new @TestCase annotation to supply the parameter values):

'@TestMethod
'@TestCase("foo", True)
'@TestCase("bar", False)
Public Sub TestMethod1(parameter As String, result As Boolean)

I think having @TestCaseSource(path) and an external file with a specific format (as presented in the linked issue) is probably way overkill, and a lot of work too (there's a reason it hasn't been implemented yet), but supporting multiple explicit @TestCase annotations seem a good compromise.

Exposing Assert.State feels wrong though, and I fear Expect would create confusion and complicate the framework/tooling.

Note that a failed Assert call cannot stop test execution (it's... complicated), so reporting the first failure / failing a test given a single failed Assert call is pretty much the only way we could simulate the behavior of an AssertFailedException.

Any thoughts on proper data-driven tests? DDT wouldn't require any changes to the Assert logic, would remove the need for explicit loops in the tests themselves, would run all iterations, and report success/failure state of each individual one.

Irubataru commented 5 years ago

Hmm, well, having some sort of way of supporting DDT would obviously help, but it seems to me like you are adding unnecessary restrictions to dictate how people use your tool.

I agree that in a perfect world with perfect code a unit test should test one thing, hence the name, but sometimes, when your code isn't perfect, you rather go for testing one concept that feels small enough that the test makes sense. Say 10 conditions you feel like should be fulfilled in one scenario, and these 10 things depends on each other. Splitting them up might be the more correct thing to do, but it could also bloat your testing environment. Testing something in a non-optimal way is still better than testing nothing.

Exposing Assert.State feels wrong though, and I fear Expect would create confusion and complicate the framework/tooling.

I totally agree, I was just trying to think of ways of circumventing my issue.

But yeah, improved support for DDT would obviously help, but I still don't really see the value of catastrophic failure. To test imperfect code you sometimes need imperfect tests, and I'd rather have imperfect tests testing old code before I refactor, than having to refactor my code so that it works with the unit testing framework I am using.

As for idea around DDT I don't have that many. Seems to me like the easiest would be to implement a '@TestFunction that tells the system that the following is a functional used for testing, and you then have a syntax for calling this function multiple times with different data. Not all that familiar with DDT, I mostly write unit tests where my unit is the smallest independent concept I can think of, not really restricting myself to one point of failure because in some situations I don't actually think that necessarily improves anything.

Edit But yes I agree that DDT is probably always preferable to unit tests with loops.

retailcoder commented 5 years ago

Ok, so dev chat recap:

Easter egg test outcome in its current form is harmful and needs to be removed;
@TestCase annotation and adding support for parameterized test methods needs to be added;
A new inspection would be flagging tests with more than a single Assert call (excluding Assert.Inconclusive and Assert.Fail), and/or tests that Assert in a loop. The inspection's xml-doc and meta-description would justify the result with the widely-accepted best practice of single test == single assert, and code examples would illustrate how to turn a loop-assert test into a DDT parameterized test with @TestCase annotations; as with all inspections, that could easily be turned off.

Irubataru commented 5 years ago

Sounds great. I did a bit of reading and couldn't find any good arguments for single test == single asserts, isn't it better to test concepts than semantics? Sure, you should test one thing, but that one thing will in many cases be much more readable if it is done with multiple asserts. E.g. testing if a value is between a range is much more readable by testing two less than asserts than one big logic blob. Oh well, this is not the place for these discussions and if you make it an inspection check I can always turn it off :> This is the best solution I think because then the users can choose how to use the tool, which should mostly strive to be helpful, not restrictive.

Really looking forward to seeing how DDT could be implemented, that would clean up some of my tests for sure.

retailcoder commented 5 years ago

Official Microsoft guidelines "When writing your tests, try to only include one Assert per test."
Some stackify blog post (no direct link to the "best practices" section) "I won’t go so far as to say that no test should ever contain a number of assertions other than one. But I will say that your unit test suite should have a test to assert ratio pretty darned near 1"
Stack Overflow "Multiple assertions are okay as long as they are all testing one feature/behavior."

Irubataru commented 5 years ago

But you also have this pretty popular stackexchange post where almost every post agrees that they do multiple asserts in an effort to test one feature/behaviour. One assert just seems like a really unnecessary restriction that doesn't necessarily correlate with what you want to achieve, testing one behaviour.

Again, it seems like something to strive for because striving for it might give you some clarity on what it is you are actually doing, but that the goal itself is nonsense.

Just saying that if you implement that inspection then a lot of people are going to have a lot of useless inspection results because they are testing one behaviour with multiple asserts.

But anyway, I still think it is a good choice, because I can't really think of a better way than to write your own best practices and let the users do what they will do anyway. As I said, it can always be disabled if one disagrees.

retailcoder commented 5 years ago

I think the Stackify article says it best: "Unit testing newbies commonly make a mistake of testing all of the things in one test method." -- basically the "one test, one assert" guideline is there to remind beginners that the Single Responsibility Principle also applies to test methods, so that when a test fails, it's immediately apparent why.

The accepted answer on that SE thread also puts it nicely:

but I do think we should strive towards only having single asserts in our tests.

Static code analysis really feels like the best way to educate users new to unit testing about this, and a test could always be annotated with @Ignore MultipleAsserts (that would be the quickfix for the inspection), ignoring a specific test without turning off the inspection.

One assert [...] doesn't necessarily correlate with what you want to achieve, testing one behaviour.

True, not necessarily - but the correlation is strong enough in beginner code that flagging multiple asserts would be useful in most situations, given @TestCase parameterized test support.

Irubataru commented 5 years ago

I agree, sorry, didn't mean to argue, I can get quite passionate on discussing rules for best practices as I feel like they should be guidelines and not rules quite often.

Anyway, reading through what I say I totally agree that the inspection should be there, and I was in essence mostly arguing about whether it should be turned on by default or not. This is obviously not something I should really have strong opinions on as it is very easy for me as an (overly) opinionated user to switch that lever anyway. You know much better than I do what is best for your users as you are developing this tool used by thousands and I am not. And the fact that you allow me to change the inspections myself, even on single tests/lines with annotation shows me that you respect my opinions, and I am grateful.

Thank you for all you do and for looking at my at times unreasonable requests :smile:

retailcoder commented 5 years ago

@Irubataru thanks for your invaluable input, and keep that passion burning!

You know much better than I do what is best for your users

Not necessarily. The balance is very delicate between promoting best practices, and shoving them down users' throats :wink:

rubberduck203 commented 5 years ago

Rubberduck has always been a pretty opinionated tool. I suspect it will stay that way until it becomes absolutely clear that no one is going to step up and create an alternative. That is life.

@retailcoder in regards to data driven tests, can we think about how we might implement them so the user has compiler assistance? I like the annotation as a first step, but it’s not as friendly as it could be. I’ll be honest, I don’t really know how we’d do that. I recently helped a friend implement DDT in Elixir, but Elixir has hygienic macros for meta programming.

DDT is usually implemented by generating tests prior to executing them. I know we’re capable of permanently writing code to the project, but can we temporarily create tests?

If it was me, I would want to write something like this.

`@TestMethod(BarData)
public sub BarTest(a,b)
    Assert.AreEqual(b, Bar(a))
end

`@TestData
public function BarData() As Variant
    BarData = [{1,2}; {3,4}; {5,6}]
end

Which would then generate tests like this.

‘@TestMethod
public sub BarTest_1_2()
    BarTest(1,2)
end

This gets complicated fast though, because not only do we have to parse the data function, but we need to maybe execute it from the C# backend? I don’t know how feasible it is. Just thinking out loud.

retailcoder commented 5 years ago

@rubberduck203 the @TestMethod annotation is already parameterized and takes an optional "category" argument, so DDT will require a new annotation; I like @TestCase , with the arguments matching the signature of the test method.

It could look like this:

'@TestMethod("test category")
'@TestCase(1, "foo")
'@TestCase(0, "foo")
'@TestCase(-1, "foo")
'@TestCase(1, "bar")
'@TestCase(0, "bar")
'@TestCase(-1, "bar")
Public Sub SomeTestMethod(ByVal arg1 As Long, ByVal arg2 As String)
   'setup SUT, consume provided parameters, assert once
End Sub

RD would simply iterate all @TestCase annotations [, validate them] and use the annotation parameters to supply the arguments for a test run. Test results could be grouped by default, and results could be expanded by individual test case as needed.

Irubataru commented 5 years ago

My initial thought is that what if you have a prime generator and have a lookup for say the first 1000 primes and want the test to just run through all of them as a dummy just to be sure? Or in my case I have a random user input generator to try to catch some scenarios I can't think of myself and want to run this 10000 times to see if I am missing something.

bclothier commented 5 years ago

I would say that the question of a random data generator is a separate question from the question of data driven test. That definitely needs its own issue & discussion, apart from this one here.

retailcoder commented 5 years ago

I think allowing parameterized tests with the parameter values supplied by an annotation is a good first step towards enabling data-driven tests. Further enhancements could include adding a @TestCaseSource annotation that can be substituted to a plethora of @TestCase annotations to load the test cases from a file with a specified format - but the first step is really to just enable parameterized tests.

The only way to eat an elephant, is one bite at a time.

bclothier commented 5 years ago

@retailcoder in regards to data driven tests, can we think about how we might implement them so the user has compiler assistance?

Looking at NUnit as an example, the only compile-time validation we get is whether the attributes are well-formed. I doubt those will get flagged as a compile-time error:

[TestCase("ha ha I am a number!", 42)]
...
public void derp(int someNumber, string someString)

But I appreciate the desire to have compile-time validation of the TestCase annotation in VB. Here's what I think, though - I think it's actually easier to just hijack the existing Compile [project name] menu options and provide our own Compile command, which can be then made to "compile" all the found annotations so that we can validate that those are well-formed just as NUnit's attributes are in Visual Studio.

retailcoder commented 5 years ago

hm, I'd just make it an inspection - it's not really different than any other IllegalAnnotation IMO.

Irubataru commented 5 years ago

I would say that the question of a random data generator is a separate question from the question of data driven test. That definitely needs its own issue & discussion, apart from this one here.

Agreed, but wouldn't it to some extent be useful if you could find a solution that solves both DDT and some sort of computer generated testing (not entirely sure what this paradigm is called when you have computer generated input with bounds for validity).

Irubataru commented 5 years ago

Also (sorry for double posting), I am looking at solving the titular issue, i.e. "Unit testing with for loops", and there are generally four scenarios where I find myself doing that:

When I have a large lookup table of the results (sure, I could run a regex on it to generate test cases, but that isn't exactly ideal)
When I am simply replacing a SequenceEquals because it is easier to check the elements individually than create arrays that this method accepts, or if I only want to check a subrange, etc.
When I have random test data
When I have two algorithms that are supposed to do the same thing, or at least close to the same thing, where e.g. the slow algorithm is a true and tested methods that I am fairly sure :tm: works, while I want to have one test that checks that the fast algorithm at least gives the same result as the slow one for a range of inputs. Again, trying to test results, not method, and if I have a way of generating results with another safe method, this is a good test.

I am not sure how I would implement any of these with an annotation system, except #2 which would be solved by removing the easter egg result.

retailcoder commented 5 years ago

PR #4956 was merged just now, and removes the easter egg result - the AppVeyor CI build and subsequent pre-release tag will be available imminently.

rubberduck-vba / Rubberduck

Difficulties unit testing with loops #5055