hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.09k forks source link

EC2/Athena/DocDB/ElastiCache/RDS Test Improvement: Dynamic Version and Instance Class Lookup #35742

Open YakDriver opened 7 months ago

YakDriver commented 7 months ago

Description

Collaborate on this initiative and contribute any insights or suggestions for refinement!

TL;DR: Tests using hardcoded values like engine versions, AMIs, and compute types break with AWS updates, demanding maintenance and concealing genuine issues. To mitigate this, adopting dynamic lookup via data sources is crucial to minimize reliance on hardcoded values.

Our acceptance tests have become a source of ongoing maintenance challenges and obscure underlying issues. At present, numerous tests depend on hardcoded, yet variable AWS-defined parameters, including versions and instance classes. These hardcoded values frequently define arguments such as engine_type, host_type, instance_class, instance_type, node_type, ami, and engine_version.

  1. Maintenance Burden: Hardcoding in our tests leads to constant updates as AWS makes changes, such as changing version availability, deprecating compute types, or the combination of the two is no longer supported. Updating tests consumes unnecessary time and resources and also increases the likelihood of tests failing due to unsupported versions or classes.
  2. Blindness to Problems: Over time, the tendency of certain tests to be frequently broken due to hardcoded dependencies can desensitize us them. We may overlook genuine problems assuming they are merely a result of outdated versions or classes.

We need to mitigate these challenges by implementing dynamic lookup mechanisms using data sources within our tests. By dynamically retrieving information, such as versions and instance classes, during test execution, we can ensure that our tests remain resilient to changes in the environment and are not reliant on hardcoded dependencies.

This approach offers several benefits:

  1. Reduced Maintenance: By decoupling tests from specific mutable hardcoded information, we minimize the need for constant updates and maintenance, leading to more efficient testing workflows.
  2. Improved Test Reliability: Dynamic lookup ensures that tests adapt to changes in the environment, reducing the likelihood of false negatives caused by outdated dependencies.
  3. Enhanced Visibility: By addressing the underlying issues behind tests that regularly fail because of our hardcoded values and AWS's changes, rather than quickly updating the hardcoded values, we gain a clearer understanding of issues that really matter and can proactively address any emerging problems.

Incorporating dynamic lookup mechanisms into our testing procedures will require initial investment and adjustments to our existing workflows. However, the long-term benefits in terms of improved test reliability and reduced maintenance overhead far outweigh the initial effort.

Fixing

Follow these general steps to fix hardcoded values in acceptance tests that are based on AWS-changeable information, such as compute, version, or AMI values. Some commonly hardcoded arguments are engine_type, host_type, instance_class, instance_type, node_type, ami, and engine_version but there may be others.

  1. Verify Terraform AWS Provider Data Sources: Begin by inspecting the Terraform AWS provider acceptance tests for any hardcoded values (see the list below for some places to look). If encountered, the initial step involves checking if a corresponding data source exists to dynamically fetch the required information. For instance, within the RDS context, explore options such as aws_rds_engine_version and aws_rds_orderable_db_instance, which facilitate searching for versions and classes. If a suitable data source is found, proceed to step 4. If not, proceed to step 2. If a data source exists but lacks the necessary information or search customization, proceed to step 3.
  2. Review AWS Functionality: Consult the relevant AWS API to identify operations that furnish details about available options. For instance, the AWS RDS Go SDK offers a function called DescribeOrderableDBInstanceOptions, which furnishes dynamic and current information about RDS instances. If the AWS API doesn't provide the dynamic information we need, escalate the matter by opening an AWS support ticket and engaging with AWS engineers to incorporate the required functionality.
  3. Enhance Terraform AWS Provider Data Sources: Develop or enhance Terraform AWS provider data sources to facilitate the required search functionality for tests. This process may entail both AWS-side and Terraform-side filtering. For instance, consider the aws_rds_orderable_db_instance data source within the Terraform AWS provider, which leverages the DescribeOrderableDBInstanceOptions SDK function. While AWS offers certain filtering options like restricting results to a specific database engine, additional filtering requirements, such as identifying instances supporting RDS clusters, necessitates implementation on the provider side.
  4. Update Tests: Proceed with updating acceptance tests to utilize the newly configured data sources and eliminate hardcoded values. This typically involves integrating data source configuration into existing test setups and specifying configurations to refine data source results to align with the test scenario. For instance, in modifying an acceptance test to remove a hardcoded instance class in an RDS cluster test, include data source configuration with appropriate filters to ensure compatibility with cluster-supported instance classes.

For example, this data source configuration finds an RDS DB instance that has io1 storage, supports IOPS, and supports clusters.

NOTE: This configuration would be even better if it didn't have a hardcoded lists of preferred instance classes. Since instance classes vary significantly in price, we don't want to let the data source choose one at random since that could cause huge bills for running tests. An ideal solution would be another data source that would provide a list of instance class options, sorted by price.

data "aws_rds_orderable_db_instance" "test" {
  engine                     = "mysql"
  engine_latest_version      = true
  preferred_instance_classes = ["db.t4g.micro", "db.t3.micro", "db.t4g.small"]
  storage_type               = "io1"
  supports_iops              = true
  supports_clusters          = true
}

Then we use the information from the data source to inform the creation of resources:

resource "aws_rds_cluster" "test" {
  db_cluster_instance_class = data.aws_rds_orderable_db_instance.test.instance_class
  engine                    = data.aws_rds_orderable_db_instance.test.engine
  engine_version            = data.aws_rds_orderable_db_instance.test.engine_version
  storage_type              = data.aws_rds_orderable_db_instance.test.storage_type
  iops                      = 1000

  # additional configuration...
}

Potentially affected services

Based on a quick look in the provider, here is a starting point for services with potential problems. (Some of the services are using resources from other services in their tests.) AWS updates some services, such as changing version and compute options, more than others. Regardless, we should attempt to avoid all AWS-mutable hardcoded values.

Service Version Instance
Athena engine_version
EMR instance_type
AppAutoScaling instance_type
AppStream instance_type
Auto Scaling instance_type
Auto Scaling Plans instance_type
Batch instance_type
Cloud9 instance_type
CloudSearch instance_type
CloudWatch instance_type
CodeCatalyst instance_type
DataSync instance_type
Deploy instance_type
AppFlow node_type
DAX node_type
DMS engine_version instance_class, node_type
DocDB engine_version instance_class
EC2 ami instance_type
ECS instance_type
EKS instance_type
ElastiCache engine_version node_type
ELB instance_type
ELBv2 instance_type
EMR instance_type
Events instance_type, node_type
Finspace host_type, node_type
Firehose instance_type, node_type
GameLift instance_type
GlobalAccelerator instance_type
Glue engine_version instance_class, node_type
ImageBuilder instance_type
Kafka instance_type
KafkaConnect instance_type
Lambda engine_version engine_type, instance_type
LicenseManager instance_type
LightSail engine_version
MemoryDB engine_version node_type
MQ engine_version engine_type, instance_type
Neptune engine_version instance_class
OpenSearch engine_version instance_type
OpsWorks engine_version instance_class, instance_type
Outposts instance_type
Pipes engine_version engine_type, instance_type
Pricing instance_type, node_type
RDS engine_version instance_class
Redshift node_type
Redshift Data node_type
SageMaker instance_type
SecurityHub instance_type
ServiceDiscovery instance_type
SSM instance_type
Storage Gateway instance_type
VPC Lattice instance_type

References

github-actions[bot] commented 7 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

triggan commented 2 weeks ago

@YakDriver - I'm curious if there's been any development on this topic. I was just updating some acceptance tests for Neptune this week and noticed the few places were we hard coded engine versions and instance types.

As it pertains to our needs, we have some tests that validate in-place upgrades for both minor engine version upgrades and major engine version upgrades. While there's an API to get currently available engine versions (), there's not an API for getting a list of major or minor engine versions. And some of the tests we need to execute need to provide engine versions that are sequential (going from major engine version 1.2 to 1.3, as an example). So I'm trying to think through the best way to dynamically address those particular tests. I can certainly create a data source that pulls all available versions and perhaps returns a latest_major_version parameter. But wondering if this has been implemented elsewhere before proceeding.