Open tomelliff opened 4 years ago
Hey @tomelliff š Thank you for taking the time to file this issue! Given that there's been a number of AWS provider releases since you initially filed it, can you confirm whether you're still experiencing this behavior?
@justinretzolk just got it reproduced with Terraform AWS provider 3.75.1 (latest pre 4.0 version)
My branch fixes this as follows
Error: error waiting for Elasticsearch Domain Upgrade (arn:aws:es:eu-west-1:614455314739:domain/logs) to succeed: Upgrade from 6.8 to 7.10 FAILED: PRE_UPGRADE_CHECK
still working on adding appropriate tests and more insights as well as running regression tests
With a new commit in my branch above I was also able to retrieve more detailed information as follows:
Error: error waiting for Elasticsearch Domain Upgrade (arn:aws:es:eu-west-1:614455314739:domain/logs) to succeed: Upgrade from 6.8 to 7.10 FAILED: PRE_UPGRADE_CHECK
Cluster has 1491 shards per node which exceeds the setting cluster.max_shards_per_node value 1000
Hi Hashicorp / AWS TF provider core team.
in the past I have submitted some patches against the master repo but my fixed branch is currently based on tag 3.75.1
What would be the appropriate method to submit my fix for this issue please ?
Should I try to cherry-pick the changes in the master ? Many thanks for any insight
So far I was not able to successfully run the regression tests agains us-west-1 zone:
=== CONT TestAccElasticsearchDomainDataSource_Data_basic
=== CONT TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB
--- PASS: TestAccElasticsearchDomainDataSource_Data_basic (1524.16s)
=== CONT TestAccElasticsearchDomain_policyIgnoreEquivalent
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB (1542.69s)
=== CONT TestAccElasticsearchDomain_disappears
--- PASS: TestAccElasticsearchDomain_policyIgnoreEquivalent (1450.18s)
=== CONT TestAccElasticsearchDomain_Update_version
--- PASS: TestAccElasticsearchDomain_disappears (1515.88s)
=== CONT TestAccElasticsearchDomain_WithVolumeType_missing
--- PASS: TestAccElasticsearchDomain_WithVolumeType_missing (1192.05s)
=== CONT TestAccElasticsearchDomain_UpdateVolume_type
--- PASS: TestAccElasticsearchDomain_Update_version (4165.17s)
=== CONT TestAccElasticsearchDomain_update
--- PASS: TestAccElasticsearchDomain_UpdateVolume_type (3254.57s)
=== CONT TestAccElasticsearchDomain_tags
--- PASS: TestAccElasticsearchDomain_tags (2097.99s)
=== CONT TestAccElasticsearchDomain_nodeToNodeEncryption
--- PASS: TestAccElasticsearchDomain_update (2706.42s)
=== CONT TestAccElasticsearchDomain_EncryptAtRestSpecify_key
--- PASS: TestAccElasticsearchDomain_nodeToNodeEncryption (1289.54s)
=== CONT TestAccElasticsearchDomain_EncryptAtRestDefault_key
--- PASS: TestAccElasticsearchDomain_EncryptAtRestSpecify_key (1253.09s)
=== CONT TestAccElasticsearchDomain_Cluster_zoneAwareness
domain_test.go:146: Step 1/5 error: Error running apply: exit status 1
2022/03/30 14:35:29 [DEBUG] Using modified User-Agent: Terraform/0.12.31 HashiCorp-terraform-exec/0.15.0
Error: Error creating Elasticsearch domain: DisabledOperationException: You don't have permission to select three availability zones
on terraform_plugin_test.tf line 2, in resource "aws_elasticsearch_domain" "test":
2: resource "aws_elasticsearch_domain" "test" {
--- FAIL: TestAccElasticsearchDomain_Cluster_zoneAwareness (9.07s)
=== CONT TestAccElasticsearchDomain_AutoTuneOptions
--- PASS: TestAccElasticsearchDomain_EncryptAtRestDefault_key (1299.12s)
=== CONT TestAccElasticsearchDomain_internetToVPCEndpoint
--- PASS: TestAccElasticsearchDomain_AutoTuneOptions (1623.00s)
=== CONT TestAccElasticsearchDomain_VPC_update
panic: test timed out after 4h0m0s
I moved from 3h to 4h without more success (making parallelism set to 2 because of my laptop constraints). I will increase this a give it another try
For the 3 zones error I just found out that us-west-1 is only 2 zone will change to us-west-2 (4 zones)
:-( just a little bit more luck after 8h on us-west-2:
at least the previously failing test passed successfully
=== CONT TestAccElasticsearchDomainDataSource_Data_basic
=== CONT TestAccElasticsearchDomain_Update_version
--- PASS: TestAccElasticsearchDomainDataSource_Data_basic (1741.86s)
=== CONT TestAccElasticsearchDomain_AutoTuneOptions
--- PASS: TestAccElasticsearchDomain_AutoTuneOptions (1723.72s)
=== CONT TestAccElasticsearchDomain_WithVolumeType_missing
--- PASS: TestAccElasticsearchDomain_Update_version (4243.93s)
=== CONT TestAccElasticsearchDomain_UpdateVolume_type
--- PASS: TestAccElasticsearchDomain_WithVolumeType_missing (1181.96s)
=== CONT TestAccElasticsearchDomain_update
--- PASS: TestAccElasticsearchDomain_update (2638.81s)
=== CONT TestAccElasticsearchDomain_tags
--- PASS: TestAccElasticsearchDomain_UpdateVolume_type (3716.47s)
=== CONT TestAccElasticsearchDomain_nodeToNodeEncryption
--- PASS: TestAccElasticsearchDomain_tags (1403.58s)
=== CONT TestAccElasticsearchDomain_EncryptAtRestSpecify_key
--- PASS: TestAccElasticsearchDomain_EncryptAtRestSpecify_key (1374.62s)
=== CONT TestAccElasticsearchDomain_EncryptAtRestDefault_key
--- PASS: TestAccElasticsearchDomain_nodeToNodeEncryption (2155.89s)
=== CONT TestAccElasticsearchDomain_policyIgnoreEquivalent
--- PASS: TestAccElasticsearchDomain_policyIgnoreEquivalent (1289.11s)
=== CONT TestAccElasticsearchDomain_policy
--- PASS: TestAccElasticsearchDomain_EncryptAtRestDefault_key (1399.39s)
=== CONT TestAccElasticsearchDomain_cognitoOptionsUpdate
--- PASS: TestAccElasticsearchDomain_policy (1249.13s)
=== CONT TestAccElasticsearchDomain_cognitoOptionsCreateAndRemove
--- PASS: TestAccElasticsearchDomain_cognitoOptionsUpdate (2470.77s)
=== CONT TestAccElasticsearchDomain_LogPublishingOptions_auditLogs
--- PASS: TestAccElasticsearchDomain_cognitoOptionsCreateAndRemove (2913.92s)
=== CONT TestAccElasticsearchDomain_LogPublishingOptions_esApplicationLogs
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_auditLogs (1943.01s)
=== CONT TestAccElasticsearchDomain_LogPublishingOptions_searchSlowLogs
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_esApplicationLogs (1630.03s)
=== CONT TestAccElasticsearchDomain_disappears
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_searchSlowLogs (1641.53s)
=== CONT TestAccElasticsearchDomain_LogPublishingOptions_indexSlowLogs
--- PASS: TestAccElasticsearchDomain_disappears (1311.56s)
=== CONT TestAccElasticsearchDomain_AdvancedSecurityOptions_disabled
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_indexSlowLogs (1597.73s)
=== CONT TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_disabled (1828.39s)
=== CONT TestAccElasticsearchDomain_customEndpoint
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB (1547.48s)
=== CONT TestAccElasticsearchDomain_internetToVPCEndpoint
--- PASS: TestAccElasticsearchDomain_customEndpoint (3026.41s)
=== CONT TestAccElasticsearchDomain_AdvancedSecurityOptions_iam
--- PASS: TestAccElasticsearchDomain_internetToVPCEndpoint (3265.56s)
=== CONT TestAccElasticsearchDomainSamlOptions_disappears_Domain
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_iam (1680.43s)
=== CONT TestAccElasticsearchDomain_requireHTTPS
--- PASS: TestAccElasticsearchDomainSamlOptions_disappears_Domain (1436.80s)
=== CONT TestAccElasticsearchDomain_basic
--- PASS: TestAccElasticsearchDomain_basic (1493.76s)
=== CONT TestAccElasticsearchDomainSamlOptions_Disabled
--- PASS: TestAccElasticsearchDomain_requireHTTPS (2650.19s)
=== CONT TestAccElasticsearchDomainSamlOptions_Update
--- PASS: TestAccElasticsearchDomainSamlOptions_Disabled (1682.28s)
=== CONT TestAccElasticsearchDomain_VPC_update
panic: test timed out after 8h0m0s
any insights on this please ?
Can someone help with this please ?
Anyone ?
Community Note
Terraform Version
Terraform v0.12.10
Affected Resource(s)
Terraform Configuration Files
Debug Output
The relevant part of the debug log is small so posting it directly here:
Expected Behavior
My cluster is failing the upgrade eligibility checks but I'd expect to see the error correctly reported by Terraform with something like the following:
Actual Behavior
Steps to Reproduce
terraform apply
Important Factoids
I've moved from a 2 AZ ES cluster to a 3 AZ ES cluster in place and then immediately moved to 6.8 and then attempted to again upgrade to 7.2 but this is causing the above error on the AWS side. That bit is fine but I'd expect Terraform to properly show the error instead of
%!s(<nil>)
I wrote this in place upgrade code but didn't have a good way of inducing an upgrade failure so couldn't really test what happened in that case but it looks like AWS's API doesn't return an error, just the
FAILED
StepStatus
field. TheGetUpgradeHistory
API endpoint will show the results of any attempted upgrades in reverse chronological order so it's possible we could retrieve the first failed result from that for the domain and return the list ofUpgradeStepItem.Issues
.I am wary that I don't know a good way to force an ES cluster into a bad state though so this might be tricky to test once my ES cluster is back in to a good place.
References