Terraform unit testing framework

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io/

Other

41.69k stars 9.41k forks source link

Terraform unit testing framework #21628

Closed alexharv074 closed 2 months ago

alexharv074 commented 5 years ago

This may be a duplicate of https://github.com/hashicorp/terraform/issues/5059 but the release of Terraform 0.12 with iteration features has made the need for a real Terraform unit testing framework more urgent. In other words, there should be a Terraform equivalent of Puppet's rspec-puppet.

shamsalmon commented 5 years ago

I recommend https://github.com/gruntwork-io/terratest

alexharv074 commented 5 years ago

@shamsalmon , that's not a unit testing framework is it? As you have to create, test, destroy - i.e. slow tests?

rismoney commented 5 years ago

I'm looking for the same. My hope would be not to instantiate any infrastructure. That would be expensive. I am trying to ensure plan outputs are identical versus introducing breaking/any changes. Ideally no plan output should ever change between upgrading TF versions, say during a code refactor is performed. (ie my infrastructure did not change)

apparentlymart commented 5 years ago

Hi all,

So far efforts to do true unit testing with Terraform (that is, to test the effect of a Terraform configuration without actually running it) seem to have reached the conclusion that such a thing ends up just being a redundant re-statement of the same information that's in the configuration.

If your goal is to evaluate test assertions against the plan then the terraform show -json <planfile> command added in Terraform 0.12 could be a useful building block. That contains all of the information in the normal terraform plan output in a machine-readable format, and so captures everything Terraform can figure out without actually performing any side-effects.

I'm not sure I'd consider assertions against the plan output to be "unit testing" in the usual sense of the word, but it can be used to automate some of the work of reviewing a proposed plan in order to catch certain mistakes without direct human involvement. Certainly if your goal is to assert that the plan is empty (no actions are proposed) then this JSON representation of the plan is a reasonable way to do it.

When I'm developing modules, what I usually find myself wanting is not unit testing but rather integration testing such that I am creating real resources, but I'm doing so in an isolated, self-contained way that I can easily repeat as I make changes. Since the functionality implied by a Terraform configuration is more about the behavior of what it creates rather than the behavior of Terraform itself, testing just the configuration often can't tell you much.

For that sort of integration testing, my usual approach is to make a test subdirectory in the module and put in there a Terraform configuration that instantiates the module one or more times with different arguments. Then I can do my integration test by just running terraform apply in that directory and then reviewing the result.

I recently created an experimental new Terraform provider for representing testing assertions in a simple way using data sources in such a configuration. It's just a side-project for me right now so I wouldn't suggest embracing it for "production" use, but I like that it's just a logical extension of normal Terraform usage, rather than an entirely different workflow.

alexharv074 commented 5 years ago

@apparentlymart the goal is to write real unit tests. Just like has been possible in Puppet for 10 years.

apparentlymart commented 5 years ago

No need to be defensive, @alexharv074! I was just attempting to explain why no such thing already exists. I want to use this issue to discuss the problem and figure out what exactly is needed here

As I mentioned, previous attempts at this have created things that their instigators didn't consider very useful when they were done. Each of these had a slightly different interpretation of what they meant by "unit testing", so I'm not sure how closely they align with what you have in mind here. Sharing some examples of the sorts of real-world tests you'd like to write would help to frame the problem such that potential solutions can be evaluated against it.

Now that we have the terraform show <planfile> -json command I mentioned, it should be easier to prototype against some real Terraform runs and see what makes sense. (Previous prototypes were working either against the latest state snapshot or by decoding Terraform's internal plan file format.) If we can find a reasonable definition for what "unit testing Terraform" actually means, then it'll be much easier to talk about how to actually do it.

alexharv074 commented 5 years ago

@apparentlymart well, Terraform now has over 80 functions that can do list and map and string transformations and for expressions that can arbitrarily transform lists and maps into other lists and maps and for_each expressions that do code generation of dynamic nested blocks and a ternary operation that allows for rudimentary conditional logic and more features coming soon.

The end result is that Terraform modules can generate different resources (outputs) in response to different variables (inputs) being fed into them - so they should be unit testable like any other programming.

I don't really know how to spell it out much further:

Test case #1.

                 +------+
                 | TF   |
 data a,b,c ==>  |module| ==> resources p,q,r
                 |      |
                 +------+

Test case #2.

                 +------+
                 | TF   |
 data d,e,f ==>  |module| ==> resources s,t,u
                 |      |
                 +------+

Just the same as if I had a function in Python:

def square(x):
    return x * x

I might write unit tests

def test_case1():
  assert square(2) == 4

def test_case2():
  assert square(3) == 9

nbering commented 5 years ago

@alexharv074 What you suggest is a novel proposal. I like the idea very much.

I think I agree with @apparentlymart on the point that integration tests are the most useful thing you could make.

Having implemented a very simple provider myself, I'm pretty familiar with the separation of concerns between Terraform Core and a provider. Core handles building the dependency graph, and walking the graph to build a plan for the order of operation for changes. It then executes the plan, interpolating values from outputs of past changes as needed.

To unit test a config without making changes, providers would need a "dry run" type functionality. Some APIs can provide that, others may not.

If you want to unit-test some complex expression in Terraform, you could isolate the expression as an output, and read the outputs to validate them. But you'd want to avoid interacting with resources for something like that - and I'm not sure that provides any real value.

If your configuration has expressions that are so complex that this kind of testing sounds like it's worth the effort, you might want to consider writing a very simple custom provider that accepts all your inputs, provides an output, and then you could unit test the provider.

alexharv074 commented 5 years ago

@apparentlymart , @nbering - I would add that the design of Terraform and its TF State file means that TF modules react not only to data fed into the modules but also data in the TF state files and data fetched from the underlying Cloud. Thus in an ideal world this would also be possible:

Test case #3.

                 +------+
                 | TF   |
 data a,b,c ==>  |module| ==> resources p,q,r
  tfstate S ==>  |      |
                 +------+

Test case #4.

                 +------+
                 | TF   |
 data d,e,f ==>  |module| ==> resources s,t,u
  tfstate T ==>  |      |
 mocked AWS ==>  |      |
                 +------+

Maybe that's too hard to implement but even the simplest test cases (assume an initial state file, and ignore the real Cloud) is way better than nothing.

nbering commented 5 years ago

I'm still not entirely convinced this is something that's reasonable. Providers are not required to produce any specific set of results when a change is made.

They register a schema, the schema tells Terraform what inputs would cause a resource to be recreated, but aside from that, whether data fields on a resource produce one output or another is not necessarily deterministic like you would expect for a unit test.

apparentlymart commented 5 years ago

I am starting to get a sense of what exactly we'd be looking to test here: seems like the goal is to test the logic of expressions embedded in the configuration against a fixed set of mock values for variables, resources, etc.

One way we could get there is to have a mechanism that takes a directory containing a Terraform module, a big data structure containing values for all of the objects that can potentially be referenced in expressions, and walks over all of the blocks in the configuration and evaluates them against that hard-coded scope. In that model, we would be testing literally only the expressions in the configuration, though I expect creating that mock data structure would be quite an arduous task for most modules because of how big the schemas for most resource types are.

I expect there are some optimisations we could make on top of that baseline if that seems like the right general idea. For example, maybe a tool to construct a mock data structure automatically by actually applying a configuration and then capturing everything it created before destroying it again.

Is that the sort of thing you had in mind, @alexharv074? I'm trying to think about what might allow us to get quickly to a rough prototype so that we could try writing some real test suites against it to see what they look like.

alexharv074 commented 5 years ago

@apparentlymart , certainly the goal would be to test the logic of TF modules in isolation from the behaviour of the actual Cloud. I don't fully understand what you meant but it sounds like we're on the same page.

rismoney commented 5 years ago

Let's come out of the tech weeds:

When foo is w, I expect 10 nodes to be stood up When foo is x, I expect 5.nodes When foo is w and bar is y I expect 10 nodes with 100 gigs When foo is w and bar is z I expect 10 nodes with 200 gigs

This is type of expectations I'd want. Cloud, provider, not relevant. The framework for testing whatever is.

When I unit test a puppet catalog, I'm checking to make sure under certain conditions a package is to be installed. I'm not checking the package provider behavior, or whether an install will be completed clean. That's dev work for those tools creators. I'm checking the expectation that my endpoint will be built to spec. Then when I refactor or something changes I can ensure no behavioral change occurs, or my expectation is met. My change here didn't break the logic there.

In a declarative world I have no care what changes during applies, only what the final outcome will be. That's what the tests should be about.

Its fundamentally about variables and logic. Anytime there is a variable (something that can change) you want a high degree of confidence in the result. You can pass a value into a module. The module can have a default. Your caller can have a default. There could be a ternary. It might use a tfvars. What actually gets built? What happened on .11. why is .12 breaking this?

I argue it's IMPOSSIBLE to refactor code without unit tests. Otherwise you are just making random code changes. If you dive too deep you miss the behavior and regurgitate. Ex If you unit test a calculator's add function, you are only concerned with the sum. The addends are the inputs ( say mocks). You are not concerned with what the add function does internally with the addends as long as the sum is right. The suite of tests becomes a lists of addends, that includes positives, negatives, non integers, floats, equations, etc. Then you know your add function works under all circumstances.

Not to criticize the core project - but I'd argue if terraform project itself had a really solid understanding of unit test coverage, breaking changes between versions could be minimalized and feature rollout could be steadfast and not a big bang .12 breaking megarelease. By using test driven development you can write your expectations and create the code that meets it.

alexharv074 commented 5 years ago

Yes @rismoney is right:

... it's IMPOSSIBLE to refactor code without unit tests.

apparentlymart commented 5 years ago

Okay, so let me try to state and summarize what seems to be the hypothesis so far:

Our unit of decomposition for the sake of testing is a whole Terraform module.
The inputs for testing are a Terraform module directory, values for each of its declared variables, and mock implementations of each of the providers, resource types, and data sources the module uses.
Assertions are written against a data structure representing the new state that resulted from running a Terraform "apply" on that Terraform module using only the inputs from the previous point.

I want to keep this real rather than theoretical, so let's look at an actual example. For the module under test, I chose terraform-aws-vpc-region just because it's fresh in my mind from recent work. This module has a reasonable set of input variables and resources that I think make it a good, meaty example to think about.

Here's an example test spec I wrote for it:

class AWSProviderMock(object):

    def __init__(self, config):
        self.region = config.region

    def apply_aws_vpc(self, obj):
        obj.id = "vpc-mock:"+str(obj.cidr_block)
        # Everything else is set to values from the config/plan already
        return obj

    def apply_aws_subnet(self, obj):
        obj.id = "subnet-mock:"+str(self.object_cidr_block)
        # Everything else is set to values from the config/plan already
        return obj

    def apply_aws_internet_gateway(self, obj):
        return obj

    def apply_aws_default_route_table(self, obj):
        return obj

    def apply_aws_route_table(self, obj):
        return obj

    def apply_aws_route(self, obj):
        return obj

    def read_data_aws_region(self, obj):
        obj.name = self.region
        return obj

def test_single_subnet_exists():
    result = terraform.test(
        "../", # module directory
        variables={
            "network_plan": {
                "regions": {
                    "us-west-1": {
                        "cidr_block": "10.1.0.0/16",
                        "subnets": {
                            "a": {
                                "cidr_block": "10.1.64.0/24",
                                "subnet_name": "",
                                "zone_name": "a",
                            },
                        },
                    },         
                },
            },
            "tags": {
               "Name": "test",
            },
        },
        providers={
            "aws": AWSProviderMock,
        },
    )
    assert(len(result.resources.aws_subnet.this) == 1)
    assert(result.resources.aws_subnet.this.cidr_block == "10.1.64.0/24")
    assert(result.resources.aws_subnet.this.vpc_id == "vpc-mock:10.1.0.0/16")

I use Python testing style just because that's what we were discussing earlier in the thread, but the main point here isn't about the host language or the exact way tests are written in that language but more about what a unit test is "made of":

Each provider used by the module needs a mock implementation. For the sake of example here I just wrote something in Python, but let's not focus on exactly how that mock is written for now. The point is that we have some logic for producing a placeholder result for each resource type based on its configuration.
Each test case specifies the module to test (which, in a real testing harness, might be implied), the input variables to set on that module, and which mock providers to use.
The assertions are made against a representation of what was in the state after applying the module against the mock providers.

There are some notable things that this initial sketch doesn't cover:

Applying updates against an existing state. I assumed here that because we're focused on testing only the expressions in the configuration, we can treat every test case as an initial create against an empty state.
Whether certain values would be available at plan time vs. apply time in practice. (This is often a usability concern for a module, but can also be of practical concern if e.g. a value is used to populate a count meta-argument where it's required to be known at plan time.)
What to do about nested module calls? In a conventional programming language unit test you'd either consider them part of the unit under test or do a dependency inversion refactoring to extract that nested call into a separate unit. Such a refactoring would be possible in Terraform, and would be consistent with our recommendation to keep the module tree flat.

Does that seem like a good baseline set of functionality to start from, if we assume that then higher-level helper functions could presumably be built in terms of this in the language that the tests are written in?

To help validate this hypothesis, it would be helpful to see some other real examples of an existing module that does something useful and what one or more test cases for it might look like. If you have some specific modules you'd be interested in writing unit tests for, please have a go at writing something like I wrote above. Feel free to use RSpec style or any other testing style you are familiar with; Python is not an important part of this, and I want to focus on what are the inputs to a test case, what sorts of behaviors we might see inside mock providers, and what sort of test assertions seem interesting.

Again, I'd prefer to keep this practical and talk about real examples rather than theory, because otherwise it's hard to judge whether we've selected the right set of functionality to enable useful tests to be written with a reasonable amount of effort.

Let's keep the discussion about what unit testing in Terraform should include, and not get into arguments about the pros and cons of unit testing itself. For the sake of this issue, let's assume that we're all agreed that unit testing is valuable in principle and focus on how to apply unit testing principles to Terraform in a practical, useful way; in previous discussions, that "how" has always been the sticking point.

rismoney commented 5 years ago

Applying updates against an existing state. I assumed here that because we're focused on testing only > the expressions in the configuration, we can treat every test case as an initial create against an empty state.

Absolutely agree. Its a declarative world. So we only care actual=expected. Providers are assumed to do the right things as we don't care how the apply gets done, just what the end result is in-line with what the code says it does.

Whether certain values would be available at plan time vs. apply time in practice. (This is often a usability concern for a module, but can also be of practical concern if e.g. a value is used to populate a count meta-argument where it's required to be known at plan time.)

This is the state of mocking and using doubles. In puppet, for example you might have to mock a slew of hiera data lookups, unbeknownst to you on inception, but obvious on errors, missing values.

what to do about nested module calls its a recursive answer. the result is the declared state. what the subcomponents do is blackbox. i only care about nested calls yield, if my test scenario makes expectations about them.

Each provider used by the module needs a mock implementation. This sounds like a foundational problem. I wouldn't want to beholden to each provider for testability. This has to be a terraform core driven thing, that is provider independent. Not sure how to remedy that. Perhaps the initial suggestion whereby terraform plan against an empty state sounds more sane. Is this then just a really tricky json parsing excercise?

I'd envision running these tests in tiny docker containers, that have no aws, azure or vsphere access. So literally everything is self-contained with no knowledge of the outside world.

apparentlymart commented 5 years ago

In think I was unclear in what I said about every provider needing to be mocked. In my example, all of these mocks are provided by the test author. If you fail to provide a mock or your mock is incorrect then the test will not function.

The guarantee core would be providing here is that it will call the test-provided mocks instead of real providers. I expected that the real providers wouldn't even be available in this mode, because the provided mocks replace them.

This is a key question we need to address here: is it reasonable to require a test author to provide a test double for every provider and provisioner a module uses? If not, what is the alternative?

nbering commented 5 years ago

That's an interesting solution. One of my practical concerns was how to get providers to provide a dry-run or mock mode for the sometimes hundreds of resources they expose. Making it possible to push that on the test author would be a good way to get a prototype working.

alexharv074 commented 5 years ago

@apparentlymart , I appreciate all your hard work on this and sorry for prejudging your initial response.

Proposal above

This is a key question we need to address here: is it reasonable to require a test author to provide a test double for every provider and provisioner a module uses?

I think it is not unreasonable although it also is not ideal. But perhaps the process of creating the doubles could be automated? If so, perhaps it would be no issue. Your code example certainly heads in the right direction.

My proposal

I am not familiar enough with the implementation of Terraform to fully understand what is different about it to Puppet, although the two DSLs appear similar on the surface.

I think that Terraform's plan is analogous to Puppet's catalog. When I run puppet apply, Puppet receives static file data and the "facts" sent to it by an agent and Puppet manifest code and it "compiles" all of that into a JSON document called a "catalog" which is like a "plan".

What Puppet's unit test framework then does (i.e. Ruby's Rspec + the Rspec-puppet extension) is it hooks itself into Puppet's "compiler", optionally feeds in any fake data, fake facts & test doubles that the test author has provided, calls the Puppet compiler to compile a catalog, and then evaluates the assertions made about the catalog. All of this happens offline, without any input at all from a real system-under-test.

I would expect that Terraform itself is at least similar to Puppet. It also must collect data from static files, from the tfstate file, as well as data fetched from providers, then "compile" all that into a plan.

The big question

Could a Golang-based unit test framework be written that copies what Rspec-puppet does: consume the logic of Terraform's implementation as a library, merge 1/ static file data e.g. tfvars ; 2/ an example tfstate file (which would default to an initial tfstate file) ; and 3/ the Terraform module code ; and then build a partial plan (which will, of course, contain no information about state generated by providers) allowing the test author to make assertions about the partial plan?

nbering commented 5 years ago

Terraform's plan works something like this (I might miss a few steps, I'm not a core developer, but this should give you an idea how it works):

Parse configuration files to determine which plugins are used (ie. aws_instance indicates aws plugin was used)
Start a subprocess for the plugin, and connect to it over unix socket or tcp.
Ask the plugin for its schema.
Validate the user-supplied configuration against the schema provided by the plugin.
Refresh the state in-memory by asking the providers to get the current state of all existing resources (optional, can be disabled)
Compare the current state of all resources to the user-submitted configuration.
Build a graph of changes needed to achieve the user's configuration (Terraform Core does this based on the schema alone, plugins are not consulted any further to determine if a change is needed, or possible)
Before the plan is executed Terraform walks the graph and flattens it into a series of steps that can be run in sequence based on dependencies implied by interpolated values, or explicitly declared with depends_on.

I'm not very clear on whether the last step is down before or after a plan file is created with plan + apply workflow, but I assume it's done before. And I'm probably missing some complexity here around data sources, since they are evaluated before resources . But when changes are at play, Terraform Core doesn't actually know what values it will be plugging into interpolated expressions until earlier changes have already been executed.

The complicated bits from what I understand it:

Terraform doesn't just read the schema from a flat file somewhere. It has to execute the plugin binary to even know how to read the schema.
Terraform Core can only validate the configuration in as much as the schema can describe. With some providers, the compiled configuration may not actually be valid and Terraform doesn't know that until the plugin throws an error during execution (I've found this to be pretty common on Azure especially).

That's not to say that isolating resource for test is impossible. There are some values you can know within a reasonable margin of error for your test's purposes. For example, count fields must be known before building the resource graph. So the test should at least know how many copies of that resource will exist since that value cannot be the output of a provider.

So your assertions could reliably say "this is the count of this resource that I expect to see in the result" - I think @apparentlymart's suggestion of using the plan output might also get this for you already, though some tooling would need to be built on top of it.

Using test doubles could be used to make some assertions against the properties you're interpolating for, but the accuracy of these tests will be pretty variable depending on the underlying API and how representative your doubles are.

I can definitely see some value to unit tests. when you need test times to be low for rapid iteration while trying to make changes. I am, however, skeptical that the overhead of maintaining test doubles would be a time saver in the long-run over running integration tests. It would require some pretty large scale for that to pay off. If providers could include some kind of mock mode, that would reduce the burden on those writing tests - but across all providers, making such a commitment would be huge. I think you'd at least want to demonstrate a prototype before bringing it to the provider teams.

nbering commented 5 years ago

The more I think about it, the more it occurs to me there's sort of two things you'd want to test:

The "shape" of the output, which I what I think @alexharv074 is getting at. Given these inputs, what resources will actually be created. In the end, I think this comes down to evaluating for count. I kind of think analyzing a plan for a fresh deploy without actually running it might be enough to solve that problem.

Then there's what I perceive as the more complex part of Terraform module development, and that's complex expressions. I wonder if test-doubles are more complex than we need in that case. Could a framework be built that isolates just the expression? Perhaps in the form of a configuration block that gives an identifier that locates the expression, ad provides all the dependencies for it's evaluation?

Here's a pseudo-config for something I'm playing with in my head.

resource "cloudflare_record" "exampe" {
  name = "${var.hostname}${var.subdomain != "" ? ".${var.subdomain}" : ""}"
}

test "optional_subdomain_present" {
  expression = "cloudflare_record.example.name"

  dependencies {
    var.hostname = "foo"
    var.subdomain = "bar"
  }

  assert_result = "foo.bar"
}

test "optional_subdomain_absent" {
  expression = "cloudflare_record.example.name"

  dependencies {
    var.hostname = "foo"
    var.subdomain = ""
  }

  assert_result = "foo"
}

apparentlymart commented 5 years ago

@nbering's example is an interesting alternative take on how to frame a test. I'm going to restate what I understood of it just to make sure I'm not misinterpreting:

The general idea here would be to select some evaluatable sub-portion of the module (which could be a whole resource block, or an individual argument in a resource block, depending on what we think is useful) and evaluate it against a fake static data scope to get the value that Terraform would normally pass to the provider as the "configuration object".

I previously was thinking about doing this at the whole-module level, but I think in practice that would lead us back to my more recent idea of writing test doubles for all of the providers, because I think fake static data would not be sufficient in most real-world cases.

However, if we were to think of this as applying on a per-resource-block basis then the problem is a little simpler: resources can't refer to themselves (provisioner and connection blocks notwithstanding) and so we only need to worry about providing fake data for other objects in the module. For example, taking the same subnet resource in the module I used in my last example:

def test_single_subnet():
    result = terraform.test(
        "../", # module directory
        "aws_subnet.this", # individual resource to test
        mock_data={ # Must contain suitable values for everything referenced in the config block
          "local": {
            "name_tag_base": "foo",
            "region_subnets": [
              {
                "cidr_block": "10.1.64.0/24",
                "availability_zone": "us-west-2a",
                "subnet_name": "",
              },
            ],
          },
          "var": {
            "tags": {
              "Name": "foo",
            },
          },
          "aws_vpc": {
            "this": {
              "id": "vpc-abc123",
            },
          },
        },
    )

    # "result" here is a list of objects representing the counted instances
    assert(len(result) == 1)
    assert(result.cidr_block == "10.1.64.0/24")
    assert(result.vpc_id == "vpc-abc123")
    assert(result.tags == {
      "Name": "foo (us-west-2a)",
    })

(I know @nbering's example was working at the individual argument level, while I switched to whole-resource-block level here. The above could reasonably apply to that too, with the result just being the individual argument value; the general mechanism of providing fake data to use to resolve references would still apply.)

Testing at the granularity of individual resource blocks or expressions within them simplifies the problem considerably because we can avoid running Terraform's plan or apply processes at all and focus just on evaluating expressions against static data. We don't need any provider test doubles in this case because we're not trying to model the flow of data from one resource to another: I just hard-coded an example VPC id in the fake test data and asserted that it appeared in the right place in the resulting configuration object.

Does writing tests at resource-level granularity seem reasonable? I imagine you'd still have the test suite cover an entire module, but the individual tests inside would each be for specific resource/output/local/module definitions and assertions against just the resulting configuration object, rather than a result of applying that configuration via a provider double.

apparentlymart commented 5 years ago

Sorry I replied to these out of order: I was composing in a separate editor and I sent the last one first by accident. This one is about @alexharv074's proposal and a little about @nbering's response to it.

It does indeed sound like Puppet and Terraform have a similar design, though I think there is one significant difference between Puppet and Terraform based on that description: Terraform uses logic implemented in the provider to implement planning, and the provider is allowed to reach out to remote systems if it needs to in order to produce an accurate plan. An "accurate" plan is one where any known value in the plan exactly equals the corresponding value in the final result and unknown value placeholders are provided for anything the provider can't determine until it actually applies the change.

The Resource Change Lifecycle docs describe this process at a high level from Terraform Core's perspective. The crux of the matter, though, is that if we want to perform validation, planning, or applying during the test run then some form of provider double would be required. Validation is a local-only operation (the API contract forbids the provider from accessing external resources, and thus we could potentially run this logic "for real" in tests) but the others are "online" operations.

In principle we could try to provide "automatic" test doubles that just do some default behaviors, such as leaving any attribute not explicitly set in the config set to an unknown value. However, in that case it would not be possible to provide placeholder values to test data flow between resources, as I did with the vpc_id in the earlier examples.

Given that the main purpose of Terraform logic is to describe the flow of values between objects, I think any unit testing system must support testing that somehow in order to be useful. Perhaps a module as a whole is too big and complex a unit to test at once.

nbering commented 5 years ago

@apparentlymart I think you've captured the idea I was attempting to convey.

I selected the scope of a single expression because I've historically found that most expressions are a simple passing of a single field like an ID. Those types of expressions would hardly need a unit test as they are a simple substitution.

I'll admit that I haven't gotten too much into the additional dynamic block options available in Terraform 0.12, yet. From what I've seen of that, testing with a resource as the base unit of test would probably make perfect sense.

apparentlymart commented 5 years ago

Hi again, all.

@nbering's suggestion (adjusted to whole-resource granularity as I mentioned above) seemed like a reasonably simple idea to prototype with, so I spent a few hours today working on a little helper command terraform testing eval which can be launched like this:

terraform testing eval ../ aws_subnet.this mockdata.json

It then prints out a JSON representation of the configuration object that resulted from evaluating the body of the given resource block against the given mock data. This specific mechanism for doing that isn't the point of this prototype, so let's put these implementation details aside and focus on what kinds of tests this allows us to write.

Continuing my theme, I wrote some actual executable tests for that terraform-aws-vpc-region module, specifically for its subnet resource as before.

Again, the specific API for actually running the evaluation step and making assertions against the result isn't important for this prototype -- higher-level API wrappers could always be written in Python if desired -- but this illustrates what testing the expressions in a specific resource block might entail: define mock data for everything else in the module, evaluate against it, and then assert on the result.

..F
======================================================================
FAIL: test_empty_subnet_name (test_subnets.TestSubnets)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittests/test_subnets.py", line 42, in test_empty_subnet_name
    self.assertEqual(r.cidr_block, "10.1.3.0/24")
AssertionError: u'10.1.2.0/24' != '10.1.3.0/24'

----------------------------------------------------------------------
Ran 3 tests in 1.179s

This looks like a promising start to me, but before we go any further I'd like to put this particular formulation to the test by seeing what tests for some other real-world modules might look like. Therefore I'd encourage you all to give this a try yourself and see if this allows you to test the sorts of things you were hoping to test (even if this specific Python API isn't the most ergonomic way to write them).

To try it:

Build a terraform executable from the f-testing-eval-prototype branch, where the prototype eval command is implemented.
Copy my unit test boilerplate into a similar subdirectory under the module you want to prototype with.
Bootstrap a suitable Python virtualenv to work in:
- cd unittests
- virtualenv env
- source env/bin/activate
- pip install -U nose2
Replace my TestSubnets test class with one or more classes relevant to your own module.
Run the tests from the unittests directory by running: python -m nose2

I'd really love to see what tests for some different modules look like, so if you are working in a module whose source code is public it'd be great if you could push up a prototype branch similar to how I did so we can all look at the examples and see how this testing model feels across as many different real examples as possible.

The Terraform team at HashiCorp won't be able to turn this into a real, shippable feature in the near future due to priorities being elsewhere. My implementation in this prototype was optimized for speed of implementation; in a real implementation we'd want to do some refactoring inside Terraform Core so that the test eval command doesn't duplicate so much logic, and I expect we'd want to try a few different interaction models between the test code and the Terraform CLI too. However, if we can gather a nice set of examples here that'll help figure out if this is a good high-level approach to move forward with.

Thanks for the great discussion so far!

alexharv074 commented 5 years ago

@apparentlymart , it is going to take me some time to get my head around all this but thank you so much for putting it all together and I'll do my best to try it out on a module as you say.

alexharv074 commented 5 years ago

For anyone else who wants to try this and like me didn't know how to build Terraform, I did this on my Mac OS X:

brew install golang
git clone https://github.com/hashicorp/terraform.git
cd terraform
git checkout f-testing-eval-prototype
make bin

Then I waited for about 1 hour (note the parallel builds step completely crippled my Mac for about 10 minutes as all CPU is taken up!) and voila I have a Terraform dev binary:

▶ ./pkg/darwin_amd64/terraform --version
Terraform v0.12.2-dev

apparentlymart commented 5 years ago

Hi @alexharv074! Sorry I didn't give more detail on that step.

make bin is, as I'm sure you saw, for producing release binaries across all of our supported architectures. If you run make dev instead then it should build just the one for your current architecture in (by default) ~/go/bin. Of course you have one now so that's not particularly useful in retrospect, but if you find you need to rebuild it again for some reason then hopefully that makes it faster.

alexharv074 commented 5 years ago

Ah thanks @apparentlymart . At least I know what to do now next time I do a major release of Terraform. 😁

rismoney commented 5 years ago

https://www.contino.io/insights/top-3-terraform-testing-strategies-for-ultra-reliable-infrastructure-as-code

alexharv074 commented 5 years ago

@apparentlymart , I am playing with this framework and wondering if I am doing something really dumb or if something's broken. I created a repo for my PoC here and my issue is documented in the README there. In summary, I am getting Blocks of type "dynamic" are not expected here when testing eval for that simple module yet it works fine when I run terraform apply. I also note that testing eval works fine if I remove the dynamic block, so it makes me think that the testing eval won't support dynamic blocks yet?

apparentlymart commented 5 years ago

Oh yeah that's a good point: because of the quick and hacky way I implemented terraform testing eval, it's not going through all of the usual Terraform evaluation codepaths, and in particular it's not running the dynamic block expansion logic.

I'm not near my Terraform dev environment right now, but if you're game to try some changes on your local copy I think it could be made to work by changing the following two lines:

https://github.com/hashicorp/terraform/blob/760ec68a5c587340abb87e95b42b4cc56e0f7ab4/command/testing_eval.go#L196 https://github.com/hashicorp/terraform/blob/760ec68a5c587340abb87e95b42b4cc56e0f7ab4/command/testing_eval.go#L209

If you change both of these lines to the following extra steps then I think dynamic blocks should work:

body, moreDiags := scope.ExpandBlock(rc.Config, schema)
diags = diags.Append(moreDiags)
if moreDiags.HasErrors() {
    return
}
result, moreDiags := scope.EvalBlock(body, schema)

This extra scope.ExpandBlock is what allows dynamic blocks to work, by expanding into the real blocks they represent.

Hopefully that works... I don't have a Go compiler handy to test, so I'm sorry if there are some typos/etc in there.

alexharv074 commented 5 years ago

@apparentlymart , your patch (attached) worked to fix the dynamic blocks. terraform.patch.gz

alexharv074 commented 5 years ago

@apparentlymart I have a working PoC.

I rewrote the boilerplate in Ruby so that I could use Rspec. I think Rspec is more appropriate for Terraform than Python, because other familiar DevOps testing tools like Serverspec, Test Kitchen etc use Rspec.

Project README here: https://github.com/alexharv074/terraform-unit-testing-poc Supporting code in here: https://github.com/alexharv074/terraform-unit-testing-poc/blob/master/spec/spec_helper.rb Test cases in here: https://github.com/alexharv074/terraform-unit-testing-poc/blob/master/spec/aws_ec2_instance_spec.rb Module under test in the same repo.

Output

▶ bundle exec rake
/Users/alexharvey/.rvm/rubies/ruby-2.4.1/bin/ruby -I/Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-core-3.8.1/lib:/Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-support-3.8.2/lib /Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-core-3.8.1/exe/rspec --pattern spec/\*\*\{,/\*/\*\*\}/\*_spec.rb

aws_instance.this
  with instance_count 0
    should have be an empty list
  with no EBS volumes
    should have AMI ami-08589eca6dcc9b39c
    should have instance_type t2.micro
  with an EBS volume
    should have an ebs_block_device list
    should have one ebs_block_device
    device_name should be /dev/sdg

Finished in 5.38 seconds (files took 0.15504 seconds to load)
6 examples, 0 failures

My initial feeling is it's on the right track.

apparentlymart commented 5 years ago

Thanks for working on that, @alexharv074!

Based on my examples and yours so far it's seeming like we've got a reasonable general model for tests here: given a configuration block and static mocks for everything the block depends on, compare the resulting configuration object value to an expected object.

I'd intentionally implemented this as a helper in the CLI because I'd like (for the moment at least) to keep that evaluation mechanism separate from any higher-level testing API built on top of it. As you said, rSpec is comfortable for some folks coming from other tools like Puppet, but any particular choice here has some tradeoffs.

With that said, now that we have two different prototype languages to play with, I hope we can continue to gather examples, perhaps by authoring example unit tests for some of the verified modules from Terraform Registry to keep us honest about not writing contrived modules that naturally fit the model we have in mind.

On the separate subject of test implementation language, though:

One thing I was interested to see is whether any of our real-world examples would include any "non-declarative" tests that warrant using an imperative programming language to write the tests. So far our two (small) examples haven't, and I have a hypothesis that because the language being tested is declarative therefore most reasonable tests should be expressible in a declarative language too. If that hypothesis holds (I'd love to see real-world counterexamples!),

If a declarative language for the tests meets our needs, making that language be built on HCL too would have the advantage of not having to translate between type systems. We can already see one example of that awkwardness in the rspec tests where there is a statement expect(r.ebs_block_device).to be_an Array, which makes sense to a Ruby developer because Array is a Ruby type, but the underlying Terraform type kinds that map to it are list, tuple, and set, so it's annoying (though not the end of the world) that the tests are expressed in a type system that doesn't match Terraform's.

Earlier on in the discussion, before we started looking at terraform testing eval use-cases, I sketched a possible HCL-based testing language, in which I imagined it making some assumptions to make the test definitions more concise and (subjectively) more readable:

Because all of the mock inputs are static data, therefore the output should be deterministic too, and so it's reasonable to just assert whole instance configs instead of testing individual attributes.
We can specify mock data using similarly-shaped configuration constructs as the concepts they are mocking. For example, the mock_resource block echoes the resource block from the main language.
Because this is a specialized language for Terraform, we know that the subject for every describe block will be a referenceable Terraform object, and thus we can just use the normal Terraform reference syntax for it, and not have a bunch of extra punctuation/noise around it to adapt to a different host language.

With all of that said, this is definitely a tradeoff:

It's assuming that being able to directly use the type system and concepts from Terraform makes for a user experience that is better enough to outweigh the familiarity of using an existing test framework. That's a tradeoff Terraform is already making for its main language though, and so I think it's reasonable to project that assumption onto the test system.
It's assuming that a declarative language is sufficient to describe test cases for another declarative language. Looking for counter-examples to evaluate is the best way for us to come to a conclusion on this one: can we find something reasonable (meaning: a real situation, not something contrived specifically to disprove this hypothesis) that we can't find a declarative representation of?

My early thought here (subject to change as we gather more examples/information, of course) is that either we should have a single declarative Terraform-native test syntax like this, or we should offer a building block like terraform testing eval (possibly with a different interface that is more efficient) to allow for an ecosystem of test system adapters for a variety of different general-purpose languages.

I find myself leaning slightly towards the "single declarative language" angle right now, because it aligns with Terraform's own philosophy and avoids cross-language type/value translation issues, but I'd like to gather more data to test my "declarative language is sufficient" hypothesis.

alexharv074 commented 5 years ago

@apparentlymart , in my experience the flexibility of Ruby's data munging capabilities does make it a good fit for automated testing - it's a powerful and featureful langauge - but for readability, I also do like the look of your HCL-based testing language. Puppet have also talked about introducing a Puppet DSL-based testing language although I think the lack of demand for it has never really made it make sense. But I'm thinking, couldn't we just have both? If someone wants to use Python, they could use Python and "testing eval" and if others want to use HCL they could use HCL and so on? It seems to me that the "testing eval" command you made was easy to implement and if you refactored a bit shouldn't introduce a maintenance burden? and I can see it being useful even as a standalone diagnostics tool.

alexharv074 commented 5 years ago

Concerning:

I hope we can continue to gather examples, perhaps by authoring example unit tests for some of the verified modules from Terraform Registry to keep us honest about not writing contrived modules that naturally fit the model we have in mind.

Sure. The example I chose wasn't contrived really. I began by thinking that the first thing I want to be able to test is given inputs X, Y, Z can I check that code generation via for_each can be tested - and I found it could so that was a good start.

Here are other things I am assuming/hoping that the "testing eval" can do:

Template generation - e.g. how can I test that templates are generated as expected after interpolation?
Functions - e.g. if I have code with a bazillion legacy nested replace functions to do conditional logic, could I write unit tests to test the conditional logic and then refactor on Terraform 0.12 standards safely?

If it can test all of the evaluation logic in Terraform code then I can totally say I don't personally need to see too many examples to know that it's going to be massively useful. That's not to say people will actually use it! DevOps engineers mostly hate testing and there's no way most of them will test no matter how hard or easy it is.

But for me, this feature is the difference between Terraform being something that I think is safe to use in Production- and something that isn't.

apparentlymart commented 5 years ago

Hi @alexharv074... just wanted to clarify that I wasn't meaning to suggest that your recent test case was contrived, but just that we only have two examples so far and that trying it against some other modules that already exist (written by other people ideally, so we can be sure we aren't writing the configs to be easy to test, even though I know we're not trying to do that) will hopefully allow us to find the boundaries of what is possible with this testing strategy and decide if we are comfortable with them.

My main concern here is what you were alluding to at the end of your comment: for people to actually use this functionality requires that the effort required is outweighed by the benefit. If this particular testing strategy makes writing tests too hard/complicated/time-consuming in real-world situations, then that might be an indication that we should look for a different framing of the problem that makes it easier to write practical tests.

Our prototype so far assumes that it's reasonable to hand-write static mocks for everything a resource depends on, and so that's the main thing I'd like to put to the test in practice: in real-world configurations, is the set of dependencies for a resource generally small enough for this to not be overly burdensome? Are there resources where writing a mock for them is particularly complicated, due to how complicated the resource itself is? Might we need some additional tooling to help generate these mocks automatically from real-world infrastructure to reduce that burden? Rather than assuming answers to these questions, I'd prefer to try real examples and see.

alexharv074 commented 5 years ago

@apparentlymart , yes I mostly agree.

While I was writing the tests I did observe that it is harder to write tests than in Rspec-puppet because in Puppet I could by default provide no mock data at all and Puppet would use the defaults in the modules. But at the same time I realised it would be easy enough to write helpers to automate the generation of the mocks too. It looks like you have the same problem in the HCL-based approach?

I disagree that the benefit of the tests could possibly be outweighed by the effort though. If I inherited 10,000 lines of Terraform 0.7 code how could I possibly safely refactor that without this feature? No, I'd have no issue with using "testing eval" just as it is now to solve that problem. Yes, I'd automate generation of a lot of the mock data and so on.

Rather than assuming answers to these questions, I'd prefer to try real examples and see.

Well as I say I definitely think code to autogenerate the mocks is required. Should I go ahead anyway and try and add some tests to a public module?

alexharv074 commented 5 years ago

@apparentlymart , is there a robust HCL => JSON conversion yet? If so, I think it should be trivial to write a test helper that fills in default mocks isn't it?

apparentlymart commented 5 years ago

I think for generating mocks you'd more likely want to take data from a state snapshot rather than from configuration, since mocks need to include values for the attributes of the object that are decided by the provider at apply time as well as the configuration input.

terraform show -json can print out a JSON version of the state that is intended as a public interface. I suppose in principle the test library could have a helper function that can read a saved copy of that output and pull mock objects directly from it, so you could seed your bank of mocks by redirecting that JSON serialization to a file and loading from that file in the tests, then overriding only what the test needs to override to model a specific scenario.

(The stored state snapshot format is also JSON so that could instead be parsed directly, but the show command's output is intended to be easier to consume and less likely to see breaking changes as Terraform evolves in future releases.)

alexharv074 commented 5 years ago

@apparentlymart , so, firstly, wanted to check in to see if you still see value in adding tests to a public module- or, was your reason for proposing that to find out if the auto generated mocks is a requirement?

apparentlymart commented 5 years ago

I am just generally interested in seeing how this looks for some more complex modules, and where the limitations of this approach are. To be clear, I wasn't meaning to imply that I expect you in particular to do this... I'm just thinking aloud about how best to evaluate this prototype design.

alexharv074 commented 5 years ago

@apparentlymart , that's ok. I seem to be the only soul demanding this feature so happy to do some of the work! I had another response up here earlier and then got your point above. I still think HCL => JSON solves some of this problem? Noting that terraform show -json shows me nothing given an initial state.

apparentlymart commented 5 years ago

Indeed, terraform show -json shows the current state, so if nothing is created yet there will be nothing in it. My intent with that suggestion was the idea that perhaps mock generation would consist of actually applying the module (in a similar way as folks currently do to make "integration tests" as I was discussing earlier in this thread), grabbing the state via terraform show -json, and then destroying the temporary resources that the mock dataset was built from.

In principle we could get a subset of that data by generating a saved plan using terraform plan -out=tfplan and then terraform show -json tfplan and retrieving the values from there, but that would not then include any values that are only assigned at apply time, such as the mock aws_vpc id I used in my earlier example to test that the created subnets were being attached to the expected VPC.

I don't think there's enough information in the configuration for an average resource block to generate a useful mock, since it will only contain the subset of the data set by the configuration author: usually when we're composing objects together we're doing it with values assigned by the provider or by the remote API, rather than values specified directly in the configuration.

wyardley commented 5 years ago

+1000 on this. And agree with something like rspec-puppet as a model.

The only approaches I've seen thus far to testing Terraform are either integration tests, or just building tests using rspec, pytest, or similar frameworks on top of a generated plan.

It's true that there is no substitute for integration tests. We use them (kitchen-terraform and inspec in our case). But not only are they time consuming and expensive to run (and occasionally flaky), but there are certain things they don't work well with for, for example, creating a resource that can't easily be destroyed. Being able to catch some errors earlier in the process is always a win, even if we need to do additional testing to be confident that things still work. We run one against some of our main modules maybe once a day, and run them manually if we're doing extensive work on some of those modules.

For me, the point of unit testing is, as suggested above, making refactoring easier, and confirming that later changes don't accidentally introduce regressions. Also letting you test certain side effects or conditions without having to simulate them all, making sure that a specific error is thrown when presented with certain input, and so on.

Yes, sometimes there is a bit of an element of "Terraform, give me 8 things" that it can seem a bit redundant to do a test of "there are 8 things", not to mention you'll have to change that test if you change it to "give me 10 things". But in terms of doing more complex stuff with modules, and especially as Terraform adds at least some support for templating, iteration, not to mention the existing programming type constructs, I'd agree that there are a lot of things worth testing. Not only that, but since (even with 0.12) there are cases where you have to get "creative" to do some more complicated things to make resources based on structured data, it would be great to have some human readable confirmation of what the actual intended result should be.

I agree that a lot could be done simply by having some helpful wrappers allowing tests (hopefully written in fairly easy to understand expectation syntax, similar to rspec or mocha) to be done based on a generated plan in JSON format.

The closest things I've seen to the kind of thing I'm looking for would probably be https://github.com/bsnape/rspec-terraform or https://github.com/eerkunt/terraform-compliance (though the latter takes the BDD style a bit too far IMHO 😆).

alexharv074 commented 5 years ago

@apparentlymart it looks like the way testing eval is implemented it would be impossible to test this?

locals {
  key_name = "default"
  user_data = <<EOT
#!/usr/bin/env bash
%{for e in var.ebs_block_device ~}
mkfs -t xfs ${e.device_name}
mkdir -p ${e.mount_point}
mount ${e.device_name} ${e.mount_point}
%{endfor}
EOT
}

That is to say, it seems I can only provide a static value for the local variables as inputs to the tests, and thus logic inside the locals would be untestable?

alexharv074 commented 5 years ago

On the proposal above to do the unit testing framework in HCL, the tests I've written to validate template files suggest to me that it's probably not a good idea, e.g. I can't imagine it would be easy to rewrite this:

    context 'user_data' do
      before do
        @lines = r.user_data.split("\n")
      end
      it "should have a mkfs line" do
        expect(@lines[1]).to match %r{mkfs -t xfs /dev/.*}
      end
      it "should have a mkdir line" do
        expect(@lines[2]).to match %r{mkdir -p /.*}
      end
      it "should have a mount line" do
        expect(@lines[3]).to match %r{mount /.* /.*}
      end
    end

rismoney commented 5 years ago

For me, a single declarative language, while in theory seems sound, is inferior to an existing language like ruby that allows boundless constructs with familiar imperative readability.

Being tethered to an ongoing developing language like HCL will leave the community awaiting features that would rank low. The types in HCL are not overly complex and they can be understood by an outside lang. Sticking with familiar here, instead of breaking new ground is a sane approach and allows the devs and maintainers to progress HCL to a more mature state.

I would image the TF team at hashicorp itself is swamped and struggling against a backlog.

As an aside, puppet started out allowing .rb and/or it's dsl in its manifests. It found the dsl experience better and it blocked non idemptotent scenarios from arising , and dropped rb in classes. For testing it used rspec and not its dsl. Another framework called cucumber can add abstractions. The point here is building a new ecosystem instead of leveraging what's already out there seems premature until existing tools don't meet the needs. I struggle to substantiate a TF tool for TF testing as being a better way.

I could be rspec biased.

wyardley commented 5 years ago

I agree with above posters that HCL would probably not be a good fit for writing the actual tests. I don't have a strong preference for ruby / rspec specifically, but I do think an interface or library that would allow writing tests using one or more expectation based frameworks using a more normal TDD / BDD style would be better.

Unrelated note: it would be harder to implement than just parsing a plan, but a super cool option would be a dummy test driver for the various providers.