YakDriver commented 7 months ago

Description

Collaborate on this initiative and contribute any insights or suggestions for refinement!

TL;DR: Tests using hardcoded values like engine versions, AMIs, and compute types break with AWS updates, demanding maintenance and concealing genuine issues. To mitigate this, adopting dynamic lookup via data sources is crucial to minimize reliance on hardcoded values.

Our acceptance tests have become a source of ongoing maintenance challenges and obscure underlying issues. At present, numerous tests depend on hardcoded, yet variable AWS-defined parameters, including versions and instance classes. These hardcoded values frequently define arguments such as engine_type, host_type, instance_class, instance_type, node_type, ami, and engine_version.

Maintenance Burden: Hardcoding in our tests leads to constant updates as AWS makes changes, such as changing version availability, deprecating compute types, or the combination of the two is no longer supported. Updating tests consumes unnecessary time and resources and also increases the likelihood of tests failing due to unsupported versions or classes.
Blindness to Problems: Over time, the tendency of certain tests to be frequently broken due to hardcoded dependencies can desensitize us them. We may overlook genuine problems assuming they are merely a result of outdated versions or classes.

We need to mitigate these challenges by implementing dynamic lookup mechanisms using data sources within our tests. By dynamically retrieving information, such as versions and instance classes, during test execution, we can ensure that our tests remain resilient to changes in the environment and are not reliant on hardcoded dependencies.

This approach offers several benefits:

Reduced Maintenance: By decoupling tests from specific mutable hardcoded information, we minimize the need for constant updates and maintenance, leading to more efficient testing workflows.
Improved Test Reliability: Dynamic lookup ensures that tests adapt to changes in the environment, reducing the likelihood of false negatives caused by outdated dependencies.
Enhanced Visibility: By addressing the underlying issues behind tests that regularly fail because of our hardcoded values and AWS's changes, rather than quickly updating the hardcoded values, we gain a clearer understanding of issues that really matter and can proactively address any emerging problems.

Incorporating dynamic lookup mechanisms into our testing procedures will require initial investment and adjustments to our existing workflows. However, the long-term benefits in terms of improved test reliability and reduced maintenance overhead far outweigh the initial effort.

Fixing

Follow these general steps to fix hardcoded values in acceptance tests that are based on AWS-changeable information, such as compute, version, or AMI values. Some commonly hardcoded arguments are engine_type, host_type, instance_class, instance_type, node_type, ami, and engine_version but there may be others.

Verify Terraform AWS Provider Data Sources: Begin by inspecting the Terraform AWS provider acceptance tests for any hardcoded values (see the list below for some places to look). If encountered, the initial step involves checking if a corresponding data source exists to dynamically fetch the required information. For instance, within the RDS context, explore options such as aws_rds_engine_version and aws_rds_orderable_db_instance, which facilitate searching for versions and classes. If a suitable data source is found, proceed to step 4. If not, proceed to step 2. If a data source exists but lacks the necessary information or search customization, proceed to step 3.
Review AWS Functionality: Consult the relevant AWS API to identify operations that furnish details about available options. For instance, the AWS RDS Go SDK offers a function called DescribeOrderableDBInstanceOptions, which furnishes dynamic and current information about RDS instances. If the AWS API doesn't provide the dynamic information we need, escalate the matter by opening an AWS support ticket and engaging with AWS engineers to incorporate the required functionality.
Enhance Terraform AWS Provider Data Sources: Develop or enhance Terraform AWS provider data sources to facilitate the required search functionality for tests. This process may entail both AWS-side and Terraform-side filtering. For instance, consider the aws_rds_orderable_db_instance data source within the Terraform AWS provider, which leverages the DescribeOrderableDBInstanceOptions SDK function. While AWS offers certain filtering options like restricting results to a specific database engine, additional filtering requirements, such as identifying instances supporting RDS clusters, necessitates implementation on the provider side.
Update Tests: Proceed with updating acceptance tests to utilize the newly configured data sources and eliminate hardcoded values. This typically involves integrating data source configuration into existing test setups and specifying configurations to refine data source results to align with the test scenario. For instance, in modifying an acceptance test to remove a hardcoded instance class in an RDS cluster test, include data source configuration with appropriate filters to ensure compatibility with cluster-supported instance classes.

For example, this data source configuration finds an RDS DB instance that has io1 storage, supports IOPS, and supports clusters.

NOTE: This configuration would be even better if it didn't have a hardcoded lists of preferred instance classes. Since instance classes vary significantly in price, we don't want to let the data source choose one at random since that could cause huge bills for running tests. An ideal solution would be another data source that would provide a list of instance class options, sorted by price.

data "aws_rds_orderable_db_instance" "test" {
  engine                     = "mysql"
  engine_latest_version      = true
  preferred_instance_classes = ["db.t4g.micro", "db.t3.micro", "db.t4g.small"]
  storage_type               = "io1"
  supports_iops              = true
  supports_clusters          = true
}

Then we use the information from the data source to inform the creation of resources:

resource "aws_rds_cluster" "test" {
  db_cluster_instance_class = data.aws_rds_orderable_db_instance.test.instance_class
  engine                    = data.aws_rds_orderable_db_instance.test.engine
  engine_version            = data.aws_rds_orderable_db_instance.test.engine_version
  storage_type              = data.aws_rds_orderable_db_instance.test.storage_type
  iops                      = 1000

  # additional configuration...
}

Potentially affected services

Based on a quick look in the provider, here is a starting point for services with potential problems. (Some of the services are using resources from other services in their tests.) AWS updates some services, such as changing version and compute options, more than others. Regardless, we should attempt to avoid all AWS-mutable hardcoded values.

Service	Version	Instance
Athena	engine_version
EMR		instance_type
AppAutoScaling		instance_type
AppStream		instance_type
Auto Scaling		instance_type
Auto Scaling Plans		instance_type
Batch		instance_type
Cloud9		instance_type
CloudSearch		instance_type
CloudWatch		instance_type
CodeCatalyst		instance_type
DataSync		instance_type
Deploy		instance_type
AppFlow		node_type
DAX		node_type
DMS	engine_version	instance_class, node_type
DocDB	engine_version	instance_class
EC2	ami	instance_type
ECS		instance_type
EKS		instance_type
ElastiCache	engine_version	node_type
ELB		instance_type
ELBv2		instance_type
EMR		instance_type
Events		instance_type, node_type
Finspace		host_type, node_type
Firehose		instance_type, node_type
GameLift		instance_type
GlobalAccelerator		instance_type
Glue	engine_version	instance_class, node_type
ImageBuilder		instance_type
Kafka		instance_type
KafkaConnect		instance_type
Lambda	engine_version	engine_type, instance_type
LicenseManager		instance_type
LightSail	engine_version
MemoryDB	engine_version	node_type
MQ	engine_version	engine_type, instance_type
Neptune	engine_version	instance_class
OpenSearch	engine_version	instance_type
OpsWorks	engine_version	instance_class, instance_type
Outposts		instance_type
Pipes	engine_version	engine_type, instance_type
Pricing		instance_type, node_type
RDS	engine_version	instance_class
Redshift		node_type
Redshift Data		node_type
SageMaker		instance_type
SecurityHub		instance_type
ServiceDiscovery		instance_type
SSM		instance_type
Storage Gateway		instance_type
VPC Lattice		instance_type

References

35698

github-actions[bot] commented 7 months ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

triggan commented 2 weeks ago

@YakDriver - I'm curious if there's been any development on this topic. I was just updating some acceptance tests for Neptune this week and noticed the few places were we hard coded engine versions and instance types.

As it pertains to our needs, we have some tests that validate in-place upgrades for both minor engine version upgrades and major engine version upgrades. While there's an API to get currently available engine versions (), there's not an API for getting a list of major or minor engine versions. And some of the tests we need to execute need to provide engine versions that are sequential (going from major engine version 1.2 to 1.3, as an example). So I'm trying to think through the best way to dynamically address those particular tests. I can certainly create a data source that pulls all available versions and perhaps returns a latest_major_version parameter. But wondering if this has been implemented elsewhere before proceeding.

hashicorp / terraform-provider-aws

EC2/Athena/DocDB/ElastiCache/RDS Test Improvement: Dynamic Version and Instance Class Lookup #35742

Description

Fixing

Potentially affected services

References

35698

Community Note