hashicorp / terraform-aws-consul-ecs

Consul Service Mesh on AWS ECS (Elastic Container Service)
https://www.consul.io/docs/ecs
Mozilla Public License 2.0
52 stars 31 forks source link

Validation framework for terraform examples #250

Closed Ganeshrockz closed 9 months ago

Ganeshrockz commented 9 months ago

Changes proposed in this PR:

TLDR~ Framework to ease examples validation present in this repository.

Background

  1. Each time a new release goes out for Consul ECS, we rely on the acceptance tests present in this repository to determine the health of the new release. Acceptance tests only provide limited test coverage and doesn't test much of server oriented features like peering, wan federation, sameness etc.
  2. We have a started adding lot of examples that help users deploy multiple Consul features on ECS. We currently don't have a way to check if something broke in an automated fashion. Someone has to manually run these example terraform configs to determine if an E2E setup is healthy or not. The same issue occurs when we release a new version of Consul ECS. We don't have a way to perform automated sanity tests before releasing the terraform module.

Framework

  1. This PR introduces a single go test TestScenario that takes in the name of scenario (a terraform example config), applies the terraform, runs validations and destroys the deployment.
  2. The PR also adds a workflow file that can be triggered on demand from the main branch for running these validations. More details can be found in the README.md file added to this PR.
  3. The validations use Consul and ECS SDK wrappers to interact with the server/cluster and provides reliable feedback.

Alternatives

  1. I came up with a simpler version of this validation with bash scripts (here) but felt that bash brought in a lot of restrictions for performing reliable validations and decided to write everything in Go instead.

Future work

  1. Currently each of these scenarios run as parallel jobs in CI (one job running one scenario) (sample CI run). This can be extended by making sure we run parallel test scenarios within the same job. This should reduce our runner costs.
  2. Extend this to other release branches (preferably 0.7.x).

How I've tested this PR:

CI, Local testing

How I expect reviewers to test this PR:

  1. examples/main_test.go contains the core logic for testing these scenarios with hooks that needs to be implemented by each scenario.
  2. The actual scenarios are found in examples/scenarios/ with each scenario defined in it's own folder and the common folder holding all the helper functions.
  3. The Validate() function for each scenario holds custom logic that differs according to the deployment topology of the workloads. EOD, all examples try to deploy a client and server application and verify if the communication between them flows through the mesh. This logic can be found in ValidateFakeServiceResponse where we hit the client app's load balancer and verify if the desired upstream is hit. To review the individual validate functions, you might need to go over the README.md file for the example present in the examples/ folder. For example to understand how the validation works for service-sameness, please go over examples/service-sameness/README.md.

Checklist:

absolutelightning commented 9 months ago

In RegisterScenario function r.Register(scenarios.ScenarioRegistration{ error is not handled.

Ganeshrockz commented 9 months ago

In RegisterScenario function r.Register(scenarios.ScenarioRegistration{ error is not handled.

Handled in https://github.com/hashicorp/terraform-aws-consul-ecs/pull/250/commits/6eecf8428e3f6df3fee50fcfebb4a00ac12bacf8

Ganeshrockz commented 9 months ago

Merging this without running acceptance tests for this PR because these changes should not affect them in anyway. Will trigger the workflow (added in this PR for running the examples) once it gets merged to main.