Open ryan-dyer-sp opened 2 years ago
Hello and thank you for reporting an issue(s)!
I’m having a hard time understanding the intended outcome, and how that intended outcome is not being met. Would it be possible for you to provide some clarity in the usecase? This will help us build an acceptance test that fails, and then we can use that for remediation.
An example:
1) Starting with a single aws_msk_cluster, with a single EBS drive with multiple volumes, and no provisioned_throughput:
<sample HCL>
2) Attempt to add provisioned_throughput to all volumes:
<sample HCL>
3) Thing X happens. Expected Y to happen.
<errors>
ideally the following 3 configs would all behave the same way.
<no config>
provisioned_throughput {
enabled = false
}
and
provisioned_throughput {
enabled = false
volume_throughput = X
}
All three would force disablement of provisioned throughput and subsequent terraform plans using any of the three configurations when the throughput is disabled on the MSK would result in no changes necessary
Right now, \
However, the second should not constantly result in terraform plan showing needed changes.
And ideally the third would know to filter out the throughput into the API request so that people can simply enable/disable throughput via basic code of p_t { enabled = var.enabled v_t = var.v_t } and not need to do the double dynamic block as I put in the ticket in order to capture the different syntaxes that are needed for that block depending on whether its enabled or disabled
My team is also facing this issue. We support enabling provisioned_throughput but set provisioned_throughput.enabled to false by default with no additional parameters specified. so our default configuration looks like example 2 outlined in the above comment by @ryan-dyer-sp :
provisioned_throughput {
enabled = false
}
This default configuration seems to work fine with new clusters which are stood up after we added support in our internal msk module for setting the provisioned_throughput block. But using this newer version of our msk module against an existing cluster results in never ending drift since terraform is expecting the above configuration block to be specified on the target cluster and it is missing after every apply.
We can address this in our msk module by reverting to a version of the module before we supported this parameter but I would like to understand how the provider expects a disabled provisioned_throughput configuration block to be constructed so it works with new and existing clusters. Could be the answer is it doesn't support it at this time and if that is the case then I would like to understand when support of this use case will be added.
Starting from aws provider 5.0, this bug not ignored as it was in was provider v4, but produce error below
~ update in-place
Terraform will perform the following actions:
# aws_msk_cluster.msk will be updated in-place
~ resource "aws_msk_cluster" "msk" {
id = "arn:aws:kafka:us-east-1:433464930109:cluster/spoton-msk/dafaafc8-297a-410c-8acc-7903dfd668a7-12"
tags = {}
# (10 unchanged attributes hidden)
~ broker_node_group_info {
# (4 unchanged attributes hidden)
~ storage_info {
~ ebs_storage_info {
# (1 unchanged attribute hidden)
- provisioned_throughput {
- enabled = false -> null
- volume_throughput = 0 -> null
}
}
}
# (1 unchanged block hidden)
}
# (4 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
aws_msk_cluster.msk: Modifying... [id=arn:aws:kafka:us-east-1:433464930109:cluster/spoton-msk/dafaafc8-297a-410c-8acc-7903dfd668a7-12]
╷
│ Error: updating MSK Cluster (arn:aws:kafka:us-east-1:433464930109:cluster/spoton-msk/dafaafc8-297a-410c-8acc-7903dfd668a7-12) broker storage: BadRequestException: The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again.
│ {
│ RespMetadata: {
│ StatusCode: 400,
│ RequestID: "469ec42e-adb6-4733-871e-0a0c10b37939"
│ },
│ Message_: "The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again."
│ }
│
│ with aws_msk_cluster.msk,
│ on msk.tf line 100, in resource "aws_msk_cluster" "msk":
│ 100: resource "aws_msk_cluster" "msk" {
│
╵
I tried manually fix state-file, rm and import resource, but nothing help
This issue with provisioned_throughput
was entirely blocking for me until ignoring the section:
lifecycle {
ignore_changes = [
broker_node_group_info[0].storage_info[0].ebs_storage_info[0].provisioned_throughput
]
}
YMMV but I got this working without lifecycle/ignore_changes
by removing the dynamic block which conditionally create it and instead just do
provisioned_throughput {
enabled = local.provisioned_throughput_enabled
volume_throughput = local.provisioned_throughput_enabled ? local.volume_throughput : null
}
Note that for me it didn't work with s/null/0
because it fails the provider's schema validation. The explicit setting of provisioned_throughput/enabled = false
seems to do the trick. Omitting the block caused issues like others have noted.
I also hit this issue when introducing this feature. I ended up with following variable declaration:
variable "provisioned_storage_volume_throughput" {
description = <<EOF
Specifies whether provisioned throughput is turned on and the volume throughput target.
For backwards compatibility, it set to null by default.
The feature can be enabled by setting enabled flag to true and specifying volume throughput.
The volume throughput rate between EC2 instance and EBS volume is specified in MiB per second.
To use this configuration brokers must be of type kafka.m5.4xlarge or larger and
storage volume must be 10 GiB or greater.
The minimum throughput value is 250, the maximum depends on the broker type and can be found in AWS MSK documentation.
Once enabled, it is possible to disable the feature by setting enabled flag to false and volume throughput to null.
Note that changing the provisioned_storage_volume_throughput from null to object with enabled set to false or the other way around
results in runtime error.
EOF
type = object({
enabled = bool
throughput = number
})
validation {
condition = (var.provisioned_storage_volume_throughput == null ||
(try(var.provisioned_storage_volume_throughput.enabled, false) == false && try(var.provisioned_storage_volume_throughput.throughput, null) == null) ||
(coalesce(try(var.provisioned_storage_volume_throughput.enabled, false), false) && coalesce(try(var.provisioned_storage_volume_throughput.throughput, 0), 0) >= 250)
)
error_message = "The provisioned_storage_volume_throughput.throughput requires a minimum value of 250 if provisioned_storage_volume_throughput.enabled is set to true, it should be set to 0 when provisioned_storage_volume_throughput.enabled is set to false."
}
default = null
}
and the code:
...
ebs_storage_info {
volume_size = var.broker_node_ebs_volume_size_gb
dynamic "provisioned_throughput" {
iterator = provision_throughput_iterator
for_each = var.provisioned_storage_volume_throughput == null ? [] : [var.provisioned_storage_volume_throughput]
content {
enabled = provision_throughput_iterator.value.enabled
volume_throughput = provision_throughput_iterator.value.throughput
}
}
}
...
In order to smoothly migrate existing clusters, the default variable value is null, I tried object with enabled set to false but it fails at runtime.
The nasty thing is to explain valid transitions for "provisioned_storage_volume_throughput". Following transitions work:
Other transitions will fail, in particular:
I could probably experiment with pre and postconditions to improve it, but I am on Terraform 1.0.x at the moment so this is not yet an option.
All in all, I agree that the provider should be more forgiving and treat null in the same way as {enabled = false, throughput = null}, this would make it possible to simplify the migration and minimise the valid states/transitions.
P.S. The reason why I am setting throughput to 0 when disabling is to keep Terratest code simple, from TF perspective it could be null or missing when enabled is set to false.
"The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again."
My workaround for this was to increase volume_size
marginally (minimum change is 10GB) which meant there was at least one change under the MSK cluster resource (resource "aws_msk_cluster" "msk"
) along side the false -> null
no-op. This helped me to avoid using lifecycle/ignore_changes
since we needed those changes to be updatable in the future.
Update: Unfortunately, despite being able to apply the {enabled = false, throughput = 0} => null, null
change with a 10GB bump to volume_size
, the same diff is still coming up. So I'm going back to using lifecycle/ignore_changes
we are seeing this in our plans and after doing a search looks like there is no fix yet but ignoring changes?
Community Note
Terraform CLI and Terraform AWS Provider Version
Terraform v1.1.4 on linux_amd64
Affected Resource(s)
Terraform Configuration Files
Actual Behavior
The existing behavior seems to only support two configurations.
Attempts to disable provisioned throughput require that the block no longer contain the volume_throughput field; thus the weird double dynamic blocks in example above.
Once disabled terraform plan constantly detects a chance in the plan as the p_t block doesnt appear to exist as part of the state. How simple removal of the block is not sufficient to disable p_t in the case that is already enabled on the cluster.
Steps to Reproduce
Important Factoids
References