Closed kklipsch closed 7 years ago
Confirmed. We have run into this issue as well. I think it has to do with dependencies not taking the "count" into account.
I think this comes down to that the state does not track which specific instance an instance depends on, only the resource. Here is an example:
"aws_volume_attachment.db_persistent_volume_attachment.0": {
"type": "aws_volume_attachment",
"depends_on": [
"aws_ebs_volume.db_volume",
"aws_instance.db_instance"
],
"primary": {
"id": "vai-1795886726",
"attributes": {
"device_name": "/dev/sdb",
"id": "vai-1795886726",
"instance_id": "i-bb16c319",
"volume_id": "vol-cca36821"
}
}
}
When removing an aws_instance, you would have to find all aws_volume_attachments which happen to share the same "instance_id" attribute. But that would be provider, and perhaps even resource, specific.
However, this is not specific to aws. It will occur anytime you have to resources with count parameters, where one resource depends on the other. The right abstraction would be to depend on "gw_instance.db_instance.0" in this case. I don't know what the implications of that would be, though.
Turns out I was wrong. The "depends_on" attribute in the state file has nothing to do with this. Consider this diff:
-/+ aws_volume_attachment.persistent_volume_attachment.0
device_name: "/dev/sdb" => "/dev/sdb"
force_detach: "" => "<computed>"
instance_id: "i-deb76479" => "${element(aws_instance.my_instance.*.id, count.index)}" (forces new resource)
volume_id: "vol-9ba36878" => "vol-9ba36878"
It seems like changing one element of the aws_instance.my_instance.*.id causes the entire "element" expression to be considered changed.
Our current workaround is to duplicate the "aws_volume_attachment" resources, rather than using the element function.
I dug further into this. It seems the expected behaviour broke with commit 7735847579e777160664088b830624d0cde876e6, which was introduced to fix issue #2744.
To me, it seems like you want the treatment of unknown values in splats to behave differently depending on the interpolation context. When you use formatlist, you want to treat the entire list as unknown if it contains any unknown value, but for element, you only care about if a specific value in the list is unknown or not.
I did a test where I introduced a new splat operator with the only difference being how it is treated if the list contains unknown values. It solves the problem, but having two splat operators is kind of confusing.
@mitchellh: Ideas?
Thanks for the report @kklipsch and thanks for taking the time to do a deep dive, @danabr!
To me, it seems like you want the treatment of unknown values in splats to behave differently depending on the interpolation context. When you use formatlist, you want to treat the entire list as unknown if it contains any unknown value, but for element, you only care about if a specific value in the list is unknown or not.
Yep I think this is the key insight. Switching this to core
since it's not a provider-level bug.
In the meantime, duplicating aws_volume_attachments
to avoid the usage of splat / element()
is a valid workaround.
Ok thanks. Unfortunately, for our use case that very quickly becomes unwieldy as we are doing 10s of nodes currently but want to be able to scale up to hundreds.
@kklipsch: If you are OK with running a patched terraform for a while, and you don't rely on the formatlist behavior anywhere, you can just comment out the three lines at https://github.com/hashicorp/terraform/blob/master/terraform/interpolate.go#L466, and compile terraform yourself.
@danabr and @phinze I tried the work around by using terraform constructs in following manner but it did not help, Could you please share more details on the work around mentioned above "In the meantime, duplicating aws_volume_attachments to avoid the usage of splat / element() is a valid workaround."
resource "aws_instance" "appnodes" {
instance_type = "${var.flavor_name}"
ami = "${var.image_name}"
key_name = "${var.key_name}"
security_groups = ["${split(",", var.security_groups)}"]
availability_zone = "${var.availability_zone}"
user_data = "${file("mount.sh")}"
tags {
Name = "${var.app_name}-${format("%02d", 1)}"
}
}
resource "aws_volume_attachment" "ebsatt" {
device_name = "/dev/sdh"
volume_id = "${aws_ebs_volume.ebsvolumes.id}"
instance_id = "${aws_instance.appnodes.id}"
}
resource "aws_ebs_volume" "ebsvolumes" {
availability_zone = "${var.availability_zone}"
size = "${var.ebs_size}"
type = "${var.ebs_type}"
}
resource "aws_instance" "app-nodes" {
instance_type = "${var.flavor_name}"
ami = "${var.image_name}"
key_name = "${var.key_name}"
security_groups = ["${split(",", var.security_groups)}"]
availability_zone = "${var.availability_zone}"
user_data = "${file("mount.sh")}"
tags {
Name = "${var.app_name}-${format("%02d", 1)}"
}
}
resource "aws_volume_attachment" "ebs_att" {
device_name = "/dev/sdh"
volume_id = "${aws_ebs_volume.ebs-volumes.id}"
instance_id = "${aws_instance.app-nodes.id}"
}
resource "aws_ebs_volume" "ebs-volumes" {
availability_zone = "${var.availability_zone}"
size = "${var.ebs_size}"
type = "${var.ebs_type}"
}
@pdakhane: Just take kklipsch, example, but instead of using a "count" attribute of the aws_volume_attachment resource, create multiple aws_volume_attachment_resources referring directly to the instances and volumes. For example if you have three instances:
resource "aws_volume_attachment" "persistent_volume_attachment_0" {
device_name = "/dev/sdb"
instance_id = "${aws_instance.instance.0.id}"
volume_id = "${aws_ebs_volume.volume.0.id}"
}
resource "aws_volume_attachment" "persistent_volume_attachment_1" {
device_name = "/dev/sdb"
instance_id = "${aws_instance.instance.1.id}"
volume_id = "${aws_ebs_volume.volume.1.id}"
}
resource "aws_volume_attachment" "persistent_volume_attachment_2" {
device_name = "/dev/sdb"
instance_id = "${aws_instance.instance.2.id}"
volume_id = "${aws_ebs_volume.volume.2.id}"
}
This only works if you have a small number of nodes, though, and are OK to use the same number of instances in all environments.
@phinze pointed to this issue as potentially related to mine.
Here is my config (redacted for readability):
resource "aws_instance" "cockroach" {
tags {
Name = "${var.key_name}-${count.index}"
}
count = "${var.num_instances}"
...
}
resource "null_resource" "cockroach-runner" {
count = "${var.num_instances}"
connection {
...
host = "${element(aws_instance.cockroach.*.public_ip, count.index)}"
}
triggers {
instance_ids = "${element(aws_instance.cockroach.*.id, count.index)}"
}
provisioner "remote-exec" {
....
}
}
The basic idea is that every instance gets a "runner" attached that does binary deployment and other things. I'm using a null_resource to break a dependency cycle with ELB addresses used by the runner.
The first time I bring up an instance, everything works fine: each instance gets created, then the null_resource runs properly on each. However, when I terminate an arbitrary instance through the EC2 console (eg: destroying instance 1), all null_resources get rerun.
Here's the log of terraform plan after terminating an instance:
~ aws_elb.elb
instances.#: "" => "<computed>"
+ aws_instance.cockroach.1
ami: "" => "ami-1c552a76"
availability_zone: "" => "us-east-1b"
ebs_block_device.#: "" => "<computed>"
ephemeral_block_device.#: "" => "<computed>"
instance_type: "" => "t2.medium"
key_name: "" => "cockroach-marc"
placement_group: "" => "<computed>"
private_dns: "" => "<computed>"
private_ip: "" => "<computed>"
public_dns: "" => "<computed>"
public_ip: "" => "<computed>"
root_block_device.#: "" => "<computed>"
security_groups.#: "" => "1"
security_groups.2129892981: "" => "cockroach-marc-security-group"
source_dest_check: "" => "1"
subnet_id: "" => "<computed>"
tags.#: "" => "1"
tags.Name: "" => "cockroach-marc-1"
tenancy: "" => "<computed>"
vpc_security_group_ids.#: "" => "<computed>"
-/+ null_resource.cockroach-runner.0
triggers.#: "1" => "<computed>" (forces new resource)
triggers.instance_ids: "i-21867290" => ""
-/+ null_resource.cockroach-runner.1
triggers.#: "1" => "<computed>" (forces new resource)
triggers.instance_ids: "i-fd85714c" => ""
-/+ null_resource.cockroach-runner.2
triggers.#: "1" => "<computed>" (forces new resource)
triggers.instance_ids: "i-20867291" => ""
I was expecting only "null_resource.cockroach-runner.1" to be updated, but it seems that 0 and 2 changed as well.
Re-titling this to indicate the nature of the core issue here. We'll get this looked at soon!
Just pinging here since we just ran into this issue as well.
Okay just consolidated a few other issue threads that were expressions of this bug into this one.
My apologies to all the users who have been hitting this - this is now in my list of top priority core bugs to get fixed soon.
As I alluded to with the re-title, this issue comes down to the fact that Terraform core is currently unaware that ${element(some.splat.*.reference)}
is a dependency on a single element from the splat. It simply sees "there is a splat reference in this interpolation" and therefore believes--incorrectly--that it needs to consider every element in the list when calculating whether or not the overall value of the interpolation is computed or not.
The most direct solution would be to "just make it work for element()
". In other words, add special-casing into Terraform's interpolation handling of splats that would look for a surrounding element()
and use the different logic for computed calculations if it is found.
This is probably not the right way to go as it is (a) difficult to implement "context-awareness" into that part of the codebase, and (b) a brittle solution that sets a bad precedent of special casing certain functions in the core.
Because of this, the core team thinks the best way forward is to add first-class list indexing into the interpolation language. This would promote the behavior of element()
to a language feature (likely square-bracket notation) and give the core interpolation code a rich enough context to be able to implement the correct computed-value scoping we need to fix this bug.
I've got a spike of first-class-indexing started, and I'll keep this thread updated with my progress.
:100:
@phinze thank you so much for the detailed response and the ongoing effort! :tada:
Thanks for the report @phinze - is there a WIP branch available to follow along?
Keen to see this one resolved. Quite limiting for those of using count with instances and template_file to generate userdata.
Does anyone know of a workaround?
@jen20 has WIP to address this issue. Stay tuned! :grinning:
:+1:
@jkinred I haven't tried, but from top of my head, the only workaround is to use provisioner's for now.
@jkinred it's case-specific, but we've been working around this issue by not using count on the template_file, and instead embedding node-specific vars in metadata and querying them at launch time using scripts that curl the cloud provider's metadata service. A lot this sort of thing:
runcmd:
- curl -s http://169.254.169.254/latest/meta-data/role >> /etc/environment
@jkinred I've come up with a workaround. We set the count on our user-data.cfg template files to a much higher number than we expect to have instances, and then only change the count of the aws_instance resource, not the count of the template files. The count of the template files never changes, so the dependent aws_instances are not recreated.
resource "aws_instance" "web" {
instance_type = "${var.instance_type}"
ami = "${var.aws_ami}"
count = "${var.instance_count}"
user_data = "${element(template_file.user-data-web.*.rendered, count.index)}"
}
resource "template_file" "user-data-web" {
template = "${file("templates/user-data.cfg")}"
vars { fqdn = "${var.short-env}-web-${count.index}.${var.vpc_domain}" }
count = "20"
}
This may or may not help in your situation, depending on what's in your user_data.
@billputer I think yours workaround is awesome!
@jen20 if it is possible, could you please think of backward compatibility. I mean that, will it be possible to increment/decrement size without resources being destroyed and created again, after your patch will be applied?
Thanks for all the suggestions! @billputer, great workaround and suitable for our use case.
Workaround, TODO and bug reference added!
Hi @phinze , any progress on this issues? The workaround if fine for template_file but not for EBS volumes. It doesn't make sense to overbuild storage as a workaround. We need to be able to use count to build instances and attach EBS volumes.
It is very possible this is fixed in 0.7 (as of writing, 0.7 is the "master" branch of Terraform). At the very least, we're a lot closer. 0.7 introduces first class lists so you no longer need element()
(we're keeping it for backwards compat though). In this case, you can now directly do listvar[count.index]
.
The internals of list access are very different from element()
which is a function that simply mimicked list access. Because it was a function, core was unable to realize you're referencing a single list element, as @phinze pointed out above. Its still very possible we don't make that distinction and this is still a bug, but the right architecture is in place now where we can fix it.
I'm going to keep this open as we should be in a strong place now where we can add a core test case to verify whether or not this works to prevent regressions. For anyone who wants to help: we'd need a test in context_apply_test.go
that reproduces this bug on 0.6 (check out a tag to get a confirmed failure). Whether it passes or fails having the test would be helpful. We can write it in time otherwise.
With the test in place the actual work can begin to fix it if it is failing, or if its passing it will prevent any regressions in the future.
Apologies this has taken so long, bringing first class lists/maps into Terraform has been a huge undertaking and has taken months of full time work.
Hello,
I'm trying to apply the workaround provided by @billputer It seems to be very logical, but I don't know why, it's doesn't work for me.
When I try to scale by increasing the nb_instance variable, all my instances are recreated.
Do you know why ?
Thank you, Pierre
variable "nb_instance" {default = "3"}
variable "build_number" {default = "16"}
resource "template_file" "config_filebeat" {
template = "${file("${path.module}/templates/provisioner/filebeat.cfg.tpl")}"
vars {
laas_project = "123456789"
}
}
resource "template_file" "config" {
count = 10
template = "${file("${path.module}/templates/provisioner/chef.cfg.tpl")}"
vars {
instance_name = "${format("myappone_terraform_pierre_%s_%02d",var.build_number,count.index + 1)}"
#for yaml convention, we need to add some tabulation (4 spaces) in front of each line of the validation key.
validationkey = "${replace(replace(file("${path.module}/keys/validation.pem"), "/^/" , " "), "\n", "\n ")}"
secret_key = "${file("${path.module}/keys/chef_secret")}"
filebeat_config_file = "${base64encode(template_file.config_filebeat.rendered)}"
}
}
resource "template_cloudinit_config" "init_server" {
count = 10
gzip = true
base64_encode = false
part {
filename = "init.cfg"
content_type = "text/cloud-config"
content = "${element(template_file.config.*.rendered, count.index)}"
}
}
# Create a web server
resource "openstack_compute_instance_v2" "myapp" {
count = "${var.nb_instance}"
name = "${format("myappAS_%02d", count.index + 1)}"
flavor_name = "m1.medium"
image_name = "RED_HAT_6.2"
key_pair = "pierre"
security_groups = [
"op-default",
"myapp_as"
]
network {
name = "Internal_Internet_Protected"
}
network {
name = "Internal_Network_protected"
}
user_data = "${element(template_cloudinit_config.init_server.*.rendered, count.index)}"
}
Here's another repro case w/ EFS resources (i.e. that bug still exists in 0.7 ddc0f4cdb0c5b5fb848ac4856e9bcf32cc55ec0f
):
https://gist.github.com/radeksimko/869c266bc8572c8f190059e65f12dee3
@radeksimko Thanks for the repro, I think I know where this might be.
Is using ignore_changes
for lifecycle
on the EBS and EBS attachment a reasonable solution? The below example seems to work (increasing and decreasing the nodes count). I just wanted a second opinion (pros/cons) before moving forward with it.
Example:
resource "aws_instance" "database" {
ami = "${var.amis}"
instance_type = "${var.instance_type}"
subnet_id = "${element(split(",", var.private_subnet_ids), count.index)}"
key_name = "${var.key_name}"
vpc_security_group_ids = ["${aws_security_group.database.id}"]
disable_api_termination = true
count = "${var.nodes}"
tags { Name = "${var.name}${format("%02d", count.index + 1)}" }
lifecycle { create_before_destroy = true }
}
resource "aws_ebs_volume" "database_mysql_vol" {
availability_zone = "${element(aws_instance.database.*.availability_zone, count.index)}"
iops = 1000
size = 500
type = "io1"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["availability_zone"]
}
tags { Name = "${var.name}-mysql" }
}
resource "aws_ebs_volume" "database_binlog_vol" {
availability_zone = "${element(aws_instance.database.*.availability_zone, count.index)}"
size = 50
type = "gp2"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["availability_zone"]
}
tags { Name = "${var.name}-binlog" }
}
resource "aws_volume_attachment" "mysql_vol_attachment" {
device_name = "/dev/sdf"
instance_id = "${element(aws_instance.database.*.id, count.index)}"
volume_id = "${element(aws_ebs_volume.database_mysql_vol.*.id, count.index)}"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["instance_id", "volume_id"]
}
}
resource "aws_volume_attachment" "mysql_binlog_attachment" {
device_name = "/dev/sdg"
instance_id = "${element(aws_instance.database.*.id, count.index)}"
volume_id = "${element(aws_ebs_volume.database_binlog_vol.*.id, count.index)}"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["instance_id", "volume_id"]
}
}
I ran into this issue when bumping count (using openstack), running terraform (v0.7.1-dev), how can I test if the new list interpolation syntax fixes this, how am I to replace the element(foo.*.id, count.inex) here?
@theanalyst new syntax means using: ${foo.*.id[count.index]) instead element() in your case. New syntax is working, but result is the same with destroying operations.
@serjs yeah I figured, I changed the stuff to new syntax and still ran into issues
+1 to changing to new list interpolation syntax and issue persisting and forcing a new resource.
resource "aws_volume_attachment" "SLRS_Data" { count = 3 device_name = "/dev/sdf" volume_id = "${aws_ebs_volume.SLRSData..id[count.index]}" instance_id = "${awsinstance.SLRS..id[count.index]}" }
+1 I have issue using openstack and both 'element' and '[]' syntax.
+1 I have same issue, tried ignore_changes, ${foo.*.id[count.index]) and element(). they did not work.
Same issue - new syntax didn't work for me either.
I've submitted a fix and a test for the bug which causes a resource with an attribute containing ${list[idx]}
to be re-created when list
contains an uncomputed element, even if idx
references a known element.
Note that, as of now, this does not fix the closely related ${element(list,idx)}
bug.
Any update on this issue? I think requiring the bracket notation/indexing format is a fair and logical fix, but it'd be great to know a timeline around this as the current bug can have some relatively harmful results.
Bracket notation/indexing format is not compatible with math operations, eg. something.*.id[count.index % 3]
.
@matti for this behaviour at the moment you can use the element
function - ${element(something.*.id, count.index)
- this will wrap.
@jen20 but that does not work because of the bug thats open in this thread...? not sure what you are saying.
I was referencing to the [] workaround that is coming (?) is not compatible with other stuff.
@matti interesting! Was unaware of that limitation. I tried and got something like this (formatting mine):
* node doesn't support evaluation: *ast.Arithmetic{
Op:2,
Exprs: []ast.Node{
*ast.VariableAccess{
Name:"var.count",
Posx:ast.Pos{Column:20, Line:1}
},
*ast.VariableAccess{
Name:"count.index",
Posx:ast.Pos{Column:31, Line:1}
}
},
Posx:ast.Pos{Column:20, Line:1}
} in: ${var.instance_ids[var.count - count.index]}-baz
@jen20 @matti do you know if is there an open issue anywhere for this particular problem? I searched for terms in the message above both in hashicorp/terraform and hashicorp/hil but couldn't find one.
I'm not sure if that can be solved with the current templating engine. I think that the current templating engine is the root of all evil and results in clever workarounds summarized in here https://blog.gruntwork.io/terraform-tips-tricks-loops-if-statements-and-gotchas-f739bbae55f9#.ub1tfxsmg
As the variables can not contain interpolations (https://github.com/hashicorp/terraform/issues/4084) and there are no intermediate variables I'm starting to feel very cornered here.
The point of having count
in resources is currently very broken when it's not working with splats (deletes ALB attachments, EBS volumes etc)
Hey @matti there certainly are a number of issues related to count
, but I don't know that I agree with your view that the templating engine - do you mean HIL? - is the root in all cases. The issue here, for instance, is caused by Terraform making certain decisions about how changes in state should be treated, and, as such, the solution lies in Terraform-land, not HIL-land. The fix I've offered doesn't make any changes to the templating engine.
In any case, I think the issue you brought up - not being able to do math operations inside of array index operations, is problem distinct from this issue, and deserves its own issue. Do you know if there's an open GH issue for that particular problem? Do you get an error similar to the one I posted above, or a different one?
@maxenglander, Sorry I might be wrong about it. I'm just really tired of trying to do stuff nicely and then finding out waaay later that you have cornered yourself.
On the error: I get the same error. I don't think there is an issue anywhere, I tried to look for one.
@matti I opened a separate ticket for the issue you brought up, and a fix has been submitted.
Cool. So currently does [count.index] addressing work? Because for my ALB attachments it's sometimes failing..
On 14 Oct 2016, at 9.31, Max Englander notifications@github.com wrote:
@matti I opened a separate ticket for the issue you brought up, and a fix has been submitted.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@matti I can't say without knowing the specifics of your setup. However, once the math fix makes into Terraform, you'll be able to do myresource[var.count - count.index - 1]
.
Providing a minimal in-memory test case to work off:
resource "null_resource" "foo" {
count = 2
}
resource "null_resource" "bar" {
count = 2
triggers {
key = "${element(null_resource.foo.*.id, count.index)}"
}
}
output "bar_0" {
value = "${null_resource.bar.0.id}"
}
after the following:
$ terraform apply
$ terraform taint null_resource.foo.1
a plan
shows null_resource.bar.0
being replaced, and an apply
shows the output of bar_0
has changed.
I'm trying to setup a multi-node cluster with attached ebs volumes. An example below:
If a change happens to a single node (for instance if a single ec2 instance is terminated) ALL of the aws_volume_attachments are recreated.
Clearly we would not want volume attachments to be removed in a production environment. Worse than that, in conjunction with #2957 you first must unmount these attachments before they can be recreated. This has the effect of making volume attachments only viable on brand new clusters.