Is there a style guide for Pester tests?

alx9r commented 9 years ago

I've been using Richard Berg's style guide for normal powershell code. It has helped a great deal in making my code more consistent, terse, idiomatic, and readable. Pester tests are so different from normal powershell code that that style guide doesn't help a bit for Pester tests.

I've learned some pretty time-consuming lessons with Pester that probably could have been avoided with an opinionated style guide. The main topics where I think I could have benefited from a strong opinion are as follows:

that multiple Assert-MockCalled's in a single It block ought to be avoided
how to deal with powershell's unreliable module auto-loading
scope and repetition of code across Describe blocks
scope and variables in -MockWith script blocks
scope and variables in -ParameterFilter script blocks
when to start a new Describe block vs continuing an existing Describe block
how to deal with the material differences in InModuleScope scope behavior when running a .Test.ps1 using F5 from ISE vs Invoke-Pester
tradeoffs between using plain-but-repetitive clearly-readable code, and terse-but-error-prone code in It blocks
when to use -Scope It instead of a Context block when using Assert-MockCalled

I'm sure someone with a greater corpus of tests has many more lessons learned that could be shared in a style guide.

Is there a style guide for Pester tests? If so where is it? If not, is this project the right place to create one?

dlwyatt commented 9 years ago

I'm not sure where the "Single Should per It" guideline comes from. I don't see a problem with multiple assertions, so long as they combine to test a single concept. So, I might say that $variable should not be $null, then call a method on the $variable object, and compare the result to some expected value. I'm fine with that being in the same It block.

That said, I think it's a cool idea, and we could put it on the wiki here.

alx9r commented 9 years ago

@dlwyatt Oops, that should have read "multiple Assert-MockCalled's per It" in order to avoid what @nohwnd calls "assertion roulette". I've changed it in the list.

alx9r commented 9 years ago

@dlwyatt I expect that some aspects of a style guide would become topics of debate. Do you think those debates should occur here as issues?

dlwyatt commented 9 years ago

Yep, that's fine. We could also do it on the Google group, but I think it's okay to have the conversation here in an Issue log for now.

nohwnd commented 9 years ago

First of all testing is really subjective. What seems readable to you may seem awfully complicated to me. And the other way around. Setting a one-size-fit-all style is likely not possible, and I am not trying to do that. What I may try on the other hand is to set the foundation, and explain why I do things the way I do it. But at the same time don't take me as an authority, I am not one by any means. If you find your way to do things, and it makes sense for you for some reason, please argue with me.

Let's talk about why we test. We test because we need a simple set of boundaries that define a more complicated system. Coming up with simple tests and gradually refining them to define more complex systems is easy for us humans. Definitely easier than defining a complex system in a single swoop.

Saying that 1 + 1 = 2 is simple, we want it to be true because that is how we think the system should behave, and we can express what we want with a simple test:

1 + 1 | Should Be 2

We can run the test and if it succeeds, then the system likely works as defined.

In other tests we could also define that 3 + 5 = 8, 10 + 3 = 13 and many more. We are using other test cases to further define the expected behavior of the + operator. But notice one thing, every test is roughly as complex as the previous. Adding together three and five is pretty much as complicated as adding one and one. The tests are not becoming more complex, only the system under test (in this case the + operator) is becoming more complex as it needs to accomodate more use cases.

Why I am so sure that the one test case is not more complex than the other? Well, we technicians like to measure things, and so of course we can measure complexity as well, in this case we call it the cyclomatic complexity. In simple word, code has cyclomatic complexity of 1 if there is only single path through it. In such code there must not be any if or other construct like loop or switch. Cyclomatic complexity of 1 is what we are aiming for in test code. Such code is easy to understand, and easy to reason about because there are no ifs and whens to distract us.

Another ingredient of reliable test suite is to make the tests fail. The so called RED, GREEN, REFACTOR cycle. But failing the test just willy nilly is not enough. To demonstrate that, let's replace our example with this test, of a function that should return '1' no matter what:


Describe "Write-One" {
    It 'Outputs 1' { 
        # -- Arrange
        $expected = 1

        # -- Act
        $actual = Get-One

        # -- Assert
        $actual | Should Be $expected
    }
}

Currently it fails with CommandNotFoundException. Does that count as failed? No it does not! The assertion did not fail, a prerequisite of the test failed. To make it fail correctly. Add the function, make it return 200 and run the test again. Now it fails in the assertion, and you can proceed to implement the function.

When the test finally succeeds, you might consider trying to alter the SUT code in a such way that makes the test fail again. If it does not fail you are probably not 100 % sure why it worked in the first place. Take some time to investigate and likely add more test.

If you do that and you write a test that succeeded on first run don't panic, you just called a so called characterization test. Just make sure that you go back to the code and you change that one line of code that make the test (the assertion in that test!) fail. Then change your code back and run the tests again. When you perform this check, notice if you knew exactly which line to change, and how. If you did not, and you constantly have to try few lines before making the test fail, your code or your test are probably too complicated and might need a bit of improvement.

Lastly, a reliable test must be deterministic. If given the same input without changing anything else it should always fail (or succeed), but never alter between those two states. Making a test as simple as possible, and testing just a single aspect of the SUT at a time helps us achieve that. There is nothing more annoying than a test that occasionally fails and nobody knows why.

So to sum up: We are writing tests, because they are simple pieces of code that we saw fail for a single reason. And we also saw them succeed. At that point we were pretty sure why they failed and why they succeeded. This little piece of trust in the code then make us trust the whole system, because if every single piece of the system works correctly, the whole system must also work correctly.

So now to your actual questions:

that multiple Assert-MockCalled's in a single It block ought to be avoided A unit test should test only single aspect of the problem, for the reasons mentioned earlier, so any number of assertions larger than one should bu used with caution, not just Mock assertions. In any case try to avoid the not being sure why your test failed. A symptom of this is having 20 tests, and knowing that test 3 fails when any of tests 2, 7 or 10 fail. But with test 2 the first assertion in test 3 fails (that assertion tests presence of a file), and with test 7 the second assertion in test 3 fails, that one tests if there is a service running and similarly another assertion in test 3 fails when test 10 fails. That is assertion roulette. You can't simply tell what happened from looking at your failed tests.
how to deal with powershell's unreliable module auto-loading those are integration tests, those will always be difficult as there are so many unknowns, so you might expect some failed test suite runs. Identifying why the test run failed and making it easier for yourself to identify the same cause again is one of the challenges of engineering. I don't many co-dependent modules so I don't have any specific recommendation
scope and repetition of code across Describe blocks You can run any code in It just by defining a function, so I'd use that. But still keep the complexity of the code 1. In other words use the functions as nicely named containers for the code, but do not perform any logic there. What we should do is to provide you with mechanism that guarantees to run code after failed test. Although you can do it in a limited fashion like this:

    function SetupDatabase () {}
    function TeardownDatabase () {}

    SetupDatabase
    It "something db related" {}
    TeardownDatabase

scope and variables in -MockWith script blocks Not sure what are your use cases. But anything more complex than returning a value/object should be avioided.
scope and variables in -ParameterFilter script blocks This should contain a simple boolean condition using variables named as parameters of the mocked function (e.g. $Path for Get-ChildItem, if you do anything complex here you might rethink your strategy.
when to start a new Describe block vs continuing an existing Describe block I usually use a single describe for a single function
how to deal with the material differences in InModuleScope scope behavior when running a .Test.ps1 using F5 from ISE vs Invoke-Pester Not sure what you mean.
tradeoffs between using plain-but-repetitive clearly-readable code, and terse-but-error-prone code in It this is too subjective, but we can discuss it over few examples if you have some
when to use -Scope It instead of a Context block when using Assert-MockCalled I do it pretty much all the time, it isolates one test case from another, this is one of the features that should go in Pester 3 but we decided not to put it there because it seemed too limiting for our complicated test suites

nohwnd commented 9 years ago

@alx9r Any remarks?

alx9r commented 9 years ago

@nohwnd I've rethought this over the last few days. I think the challenges I had getting Pester to work effectively were caused by, broadly-speaking, two categories of problems:

Unreported/undiscussed/undocumented pitfalls, bugs, or limitations of Pester and/or Powershell. -- It's probably premature to bake workarounds to these into a style guide. I think I should just open github issues for these problems so we can establish what the best solutions for these are.
My unawareness of the implications of how Pester behaves for the kinds of testing I'm using it for. -- I'm sure Pester can be used effectively for a multitude of different purposes that result in a multitude of different styles. More examples of clean, readable, maintainable tests for different testing situations would help here, but I'm not that confident a style guide would.

Perhaps somewhere in the the Pester tutorials there could be links to real-world projects that have real well-writeen Pester tests that newbies can read to get a sense for what works. Reading through the tests in @dlwyatt's PSGenericMethods helped me a bit. The tests in Pester itself might be an example of this, although the self-referential nature of those tests was a bit confusing for me at first.

nohwnd commented 9 years ago

@alx9r That's a great idea! I've asked people on Twitter to share links to their projects using Pester. I remember doing this while ago, but at that point only Pester and Chocolatey were probably using Pester :))

You are right about the tests in the framework, they might tend to be difficult to understand, and we have to do different kinds of hacks to test what we need to test.

dlwyatt commented 9 years ago

I have quite a few examples of Pester test scripts in my personal repositories (and several more in DSC-related stuff that's mostly on PowerShell.org):

Some of these are older scripts that could use some cleaning up. For example, I'm in the habit now of assigning the result of the command to a variable, then using the variable in the call to Should, but many of my original Pester scripts didn't do that. This is particularly ugly with Should Throw, where the input can be a long script block.

it-praktyk commented 7 years ago

I assume that the issue can be closed now. I've labeled it 'Documentation' too - maybe in the future, someone moves parts of information from the thread/issue to the Pester wiki.

@adbertram, something for your book is here?

pester / Pester

Is there a style guide for Pester tests? #317