Open nabladev opened 3 years ago
Indeed, infrastructure CI/CD is an area of active development to identify the right approach. Our latest thinking on this is summarized in https://gruntwork.io/guides/automations/how-to-configure-a-production-grade-ci-cd-setup-for-apps-and-infrastructure-code/, which has some answers to your questions (but not all).
The key idea here is that module releases should be completely independent of actual infrastructure releases, and you should be able to test and validate your infrastructure modules without deploying to live environments in your infrastructure-live
repo. Specifically, all the module infrastructure CI should be handled by "unit tests" using a framework like https://terratest.gruntwork.io/.
In that world, the flow would be more like:
You can then configure automation pipelines for releasing this code depending on your tolerance. For example, a common workflow would be:
plan
for the new PRs to deploy the infra modules and report them to each PR.@yorinasub17 Thanks for your answer, however, we have made the experience that very often our modules have a lot of (expensive) dependencies. For example we use Azure user managed identities to handle passwordless authentication, but they need a special role assignment scoped to our kubernetes cluster (AKS) to do their job and are allowed to talk with our applications. So if we wanted to test this (otherwise very simple) module in isolation, we would need to spawn and configure an entire AKS just for the test and tear it down after that. Also talking about pipeline agents and the fact that multiple teams actively work on all the modules, it seems very hard to follow the setup you described.
For example we use Azure user managed identities to handle passwordless authentication, but they need a special role assignment scoped to our kubernetes cluster (AKS) to do their job and are allowed to talk with our applications. So if we wanted to test this (otherwise very simple) module in isolation, we would need to spawn and configure an entire AKS just for the test and tear it down after that.
Yes as mentioned above, infrastructure CI/CD is an area of active development and research. You are not going to find a single solution that works for all use cases. It all comes down to what tradeoffs you are willing to make based on your level of tolerance for risk.
Given that, for the situation you described, you have a few options:
You can opt for plan based testing based on policy checks, using a framework like opa. You can use terratest to drive these checks using the plan functions. See this example for the available plan functions. This has the disadvantage where you don't actually deploy the configuration, so it won't be a true test of your module as there are many things you can't check with just the plan (e.g., this has the risk of being a test tautology where the test function runs the same checks as your terraform code configuration).
You can pack your tests together to reuse resources across tests. E.g., your AKS functionality is released together with your core AKS module, and you launch a single AKS cluster at test time to test all the modules that depend on ASK.
You can have a permanent AKS cluster in your test environment that you use for modules that depend on AKS. You can inject this cluster into your terraform vars for the modules that require them. Note that this has the disadvantage where it is hard to have true test isolation when you are sharing kubernetes clusters across tests.
Also talking about pipeline agents and the fact that multiple teams actively work on all the modules, it seems very hard to follow the setup you described.
This setup actually works much better in larger scale teams, since the module testing can happen in isolation (because they don't conflict). The more you share resources, the more contention you are going to have across the teams, so the advantage of always deploying new resources (even if expensive) is that you can have the confidence that teams can work in isolation (in "unit") without worrying about test contention and conflicts that pollute their environments when developing the modules.
I think this is still an open problem today.
Given that modules are stored in another repo. Integrating all of this with Jenkins would be like:
terraform plan
on the PRWe can suppose that on 3.
terragrunt plan
is run under/non-prod/us-east-1/stage
because that's where we deploy stuff under development. So we know the environment but not the module, is it mysql or webserver-cluster?Furthermore, if we want to deploy stuff in /non-prod/us-east-1/qa how are we going to do that via Jenkins? Should I have like a separate Job just to run the plan/apply command? The same issue arises, how doeas Jenkins know what module to plan/apply?