Open naphta opened 3 years ago
@naphta this is by design that changing metadata_startup_script
leads to trigger replacement. Below is what its doc indicates. Does this make sense?
metadata_startup_script - (Optional) An alternative to using the startup-script metadata key, except this one forces the instance to be recreated (thus re-running the script) if it is changed. This replaces the startup-script metadata key on the created instance and thus the two mechanisms are not allowed to be used simultaneously.
@edwardmedia Absolutely I understand that these two things can't coexist at the same time, however I'm switching from one to the other (as such, not simultaneously). It feels like that should be possible, I fully expect that if I were to change metadata_startup_script
to a non null value it should destroy/recreate, but in this case I'm unsetting it (or setting to null), and then simply moving that value to metadata.startup-script
.
@naphta it is fine with me switching between startup_script
and metadata_startup_script
by alternating commenting each line in below testing config. However I noticed that you had prevent_destroy = true
. Is this still the case?
resource "google_compute_instance" "default" {
...
metadata = {
foo = "bar"
startup_script = "echo hi > /test.txt"
}
//metadata_startup_script = "echo hi > /test.txt"
...
}
@edwardmedia I did try that but terraform was still wanting to do the destroy which is undesirable. I'm wanting to swap from one to the other without the instance being destroyed. Is that the case from your example too?
edit: Having just ran it with ignore_changes
removed and after prevent_destroy
removed it does indeed want to destroy my instances which is a problem.
example diff:
# module.elasticsearch-capture["oregon"].google_compute_instance.elasticsearch-master[2] must be replaced
+/- resource "google_compute_instance" "elasticsearch-master" {
- allow_stopping_for_update = true -> null
~ cpu_platform = "Intel Broadwell" -> (known after apply)
~ current_status = "RUNNING" -> (known after apply)
- enable_display = false -> null
~ guest_accelerator = [] -> (known after apply)
~ id = "....." -> (known after apply)
~ instance_id = "....." -> (known after apply)
~ label_fingerprint = "....." -> (known after apply)
~ metadata = {
+ "startup-script" = <<-EOT
.......
EOT
}
~ metadata_fingerprint = "....." -> (known after apply)
- metadata_startup_script = <<-EOT
.....
EOT -> null # forces replacement
+ min_cpu_platform = (known after apply)
name = "....."
~ project = "....." -> (known after apply)
- resource_policies = [] -> null
~ self_link = "....." -> (known after apply)
~ tags = [
+ "capture",
+ "elasticsearch",
+ "production",
# (1 unchanged element hidden)
]
~ tags_fingerprint = "....." -> (known after apply)
# (5 unchanged attributes hidden)
~ boot_disk {
~ device_name = "persistent-disk-0" -> (known after apply)
+ disk_encryption_key_sha256 = (known after apply)
+ kms_key_self_link = (known after apply)
~ source = "....." -> (known after apply)
# (2 unchanged attributes hidden)
# (1 unchanged block hidden)
}
+ confidential_instance_config {
+ enable_confidential_compute = (known after apply)
}
~ network_interface {
~ name = "nic0" -> (known after apply)
~ network_ip = "..... -> (known after apply)
~ subnetwork = "....." -> "platform-oregon"
~ subnetwork_project = "....." -> (known after apply)
# (1 unchanged attribute hidden)
}
~ scheduling {
~ automatic_restart = true -> (known after apply)
~ min_node_cpus = 0 -> (known after apply)
~ on_host_maintenance = "MIGRATE" -> (known after apply)
~ preemptible = false -> (known after apply)
+ node_affinities {
+ key = (known after apply)
+ operator = (known after apply)
+ values = (known after apply)
}
}
- shielded_instance_config {
- enable_integrity_monitoring = true -> null
- enable_secure_boot = false -> null
- enable_vtpm = true -> null
}
# (1 unchanged block hidden)
}
@naphta this is what I mentioned earlier. When you alter metadata_startup_script
, it triggers recreation. This is by design. You can't avoid it via manipulating the config without changing the design
. Does it make sense?
I did try that but terraform was still wanting to do the destroy which is undesirable. I'm wanting to swap from one to the other without the instance being destroyed.
So how do you suggest I get myself out of this hole that has been dug?
My expectation is that whenever I changed metadata_startup_script
to a non null value that would be the case, but this is a bit of a different scenario since I'm trying to essentially reverse, or back out from using it.
Also the code I took a look at seemed to suggest the behaviour I'm looking for was intentional as the error thrown when both are set is checking that the value is not empty.
I've tried modifying my state (which I'm not a huge fan of), but really it seems crazy to me that I'm resigned to destroying my instance and all of its data to achieve this.
The key part here for me is doing it gracefully.
@naphta i am not sure if there is a good way to achieve it gracefully
. @melinath what do you think?
I think we could achieve this with a customize diff func; I'll check with folks whether that's something we would want to do.
I think we could achieve this with a customize diff func; I'll check with folks whether that's something we would want to do.
I'm happy to run the provider locally with a bit of guidance of where to make that change if you're not willing to push it upstream.
I've talked with folks and we're not considering this a bug; the two fields are intentionally meant to be different ways to manage the startup script, and part of that is that switching from metadata_startup_script to metadata.startup_script triggers a recreation.
@naphta If you wanted to make this change, you might be able to do it with a CustomizeDiff function. I believe this would be on the google_compute_instance resource (in google/resource_google_compute_instance.go). There are other examples of CustomizeDiff funcs in the codebase.
But it's not guaranteed that we would accept this change. It would be useful to know more about your use case. Could you elaborate a bit on why recreating the instance would cause problems for you, and why continuing to use metadata_startup_script
is not an option?
@melinath It's really a bit of a lack of understanding/poor design our end in writing our original modules. The start-up scripts were doing a fair bit of provisioning which we're moving over to Ansible as for this use case that makes a lot more sense as we can't move the state away from the instances (using scratch disks for performance).
The issue is that the instances created with the start-up scripts exist and are in production with data on them, so there's not a huge desire to go through the process of recreating them (syncing up 5TB of data 3 times, plus moving around production traffic) when really the only reason is to shift some metadata around and put us into a situation in the future where this wouldn't be an issue going forward.
We could continue to use metadata_startup_script
, but in this scenario where accidental destruction would be a disaster we're thinking it's best not to use the option where that can happen.
I personally just went in with the expectation that you'd be able to swap between the two options (obviously if you went from metadata.startup-script
to metadata_startup_script
with changes I'd expect that to trigger a recreation) with low friction. Seeing the line of code where it only throws the error when they both exist but only when one isn't empty/null is what made me consider this a bug.
The only other option I've thought might get me to the state I'd like is by running two applies, one with ignore changes on metadata_startup_script
whilst setting it to a null/unset value and then a second apply where the script is placed back but into metadata object directly.
Thanks for the additional details!
The only other option I've thought might get me to the state I'd like is by running two applies, one with ignore changes on
metadata_startup_script
whilst setting it to a null/unset value and then a second apply where the script is placed back but into metadata object directly.
I was also wondering if that might work. I think it should?
I couldn't get it to work with a two stage apply. removing metadata_startup_script while it is still in ignore_changes never removes it from the state. removing it without ignore_changes always want to recreate.
My workaround to this issue is
1) update TF file from metadata_startup_script to metadata.startup-script
2) terraform state rm google_compute_instance.my_vm
3) via GCP console delete the startup-script from the instance
4) terraform import google_compute_instance.my_vm vm-name
5) terraform apply
to add the startup script back to the VM.
I couldn't get it to work with a two state apply. removing metadata_startup_script while it is still in ignore_changes never removes it from the state. removing it without ignore_changes always want to recreate.
My workaround to this issue is
- update TF file from metadata_startup_script to metadata.startup-script
terraform state rm google_compute_instance.my_vm
- via GCP console delete the startup-script from the instance
terraform import google_compute_instance.my_vm vm-name
terraform apply
to add the startup script back to the VM.
Thanks, worked for me. As a matter of fact step 2 and 4 weren't even needed :) Just had to:
terraform apply
to add the startup script back to the VM.Wanted to add some additional context to this ticket:
I ran into an issue today where we created an VM from an instance template, but later wanted to update the metadata to include additional values. It appears that the metadata_startup_script
value was set by the instance template despite the template using the metadata
block
` metadata = { startup-script = data.local_file.startup-script.content
`
Later when someone tried add additional keys to the instance created from template they encountered the error message in the original description.
resource "google_compute_instance_from_template" "vm" {
# ...
metadata = merge(
module.vm_template.template.metadata,
{
foo = "bar"
}
)
}
This change, which is totally unrelated to the startup script, causes the startup-script value of the metadata to be picked up as a change, even though it should have always been there.
An old thread, but I had an issue similar to this, following the discussion here and other part of the web, I ended up modifying the state and only then applying the new metadata.startup-script
, this is what I run on the command line:
terraform state pull \
| jq '
(.resources[]
| select (.module == "module.<module name>")
| .instances[0].attributes).metadata_startup_script = null
| .serial = .serial+1
' \
| terraform state push -
This modified the state, removing the metadata_start_script
attribute, increments the serial
, and pushes the new state.
I identify by the .module
type/name, but you can obviously use any other attribute you want (.name
is probably a good choice :monocle_face:)
Next I plan
, and apply
- and tarraform doesn't destroy my precious deployment.
For me it's also really unfortunate, because I've tested this before deploying to production, but now after moving it into production with prevent_destroy = true
, I'm not able to easily change it. I will probably end up modifying the state as suggested above, but it would be really appreciated if we could allow a smooth transition between those methods. Please close the ticket, if the decision that this will not be changed is final, so I know that I don't have to wait for an update and have to use a workaround.
Currently there are both options available as terraform registry explains:
Depending on the OS you choose for your instance, some metadata keys have special functionality. Most linux-based images will run the content of metadata.startup-script in a shell on every boot. At a minimum, Debian, CentOS, RHEL, SLES, Container-Optimized OS, and Ubuntu images support this key. Windows instances require other keys depending on the format of the script and the time you would like it to run - see [this table](https://cloud.google.com/compute/docs/startupscript#providing_a_startup_script_for_windows_instances). For the convenience of the users of metadata.startup-script, we provide a special attribute, metadata_startup_script, which is documented below.
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
Terraform v1.0.1 on linux_amd64
Affected Resource(s)
Terraform Configuration Files
I can provide configs here but for simplicities sake:
Debug Output
Panic Output
Expected Behavior
metadata_startup_script is unset, and metadata.startup-script is set without destroying the instance.
Actual Behavior
A fatal error is received as the provider is seeing that metadata.startup-script and metadata_startup_script are both populated.
Steps to Reproduce
terraform apply
Important Factoids
I've had a quick debug of the provider and can see that
resourceInstanceMetadata()
is getting both values back the same. I had a quick play with the underlying structResourceData
with theGetChange
method but that too was returning them as the same value. I'm presuming that this is because they end up being the same value GCP end, or the lifecycle settings interferring.It's very possible that this could be a strange behaviour with asking it to ignore changes to
metadata_startup_script
, however when that's unset terraform will want to destroy the existing resource rather than changemetadata_startup_script
tometadata.startup-script
.References