Closed bloopletech closed 7 years ago
I mostly agree. We are using Chef. I'd really love to use Chef to manage AWS SG (for example) because TF is awful at that, but no progress at this time. EBS, though, I would like to create during provision. I'm probably going to rewrite my code to have Chef do that instead of TF.
I'd also often like those EBS to stay around during instance destruction. There are several reasons for this, one being that TF does enjoy resource destruction quite a lot. Of course, that comes about when I need to make a infrastructure change, so if TF is meant only for initial deployments then what am I expected to use when I need to modify my infrastructure? Am I really expected to redeploy an entire set of large scientific computing instances? I would not like that but I would dislike that activity less if I am able to reuse the multi-TB EBS that I store data on.
Immutable infrastructure with separate long-lived data disks is a great design pattern, and should definitely be encouraged. Kudos to Google Borg / Pivotal BOSH for making it more widespread.
My current workaround for this issue is to use Terraform to provision all the objects:
Instance, EBS volume ( set to not allow destruction) , IAM profile (to allow attachment of the two)...
... and then use a cloud-init bootstrap to have the instance securely associate it's ( restricted through IAM policy) EBS volume. Because terraform has the overview of which instance is notionally tied to which EBS volume, it can set all the right metadata to make that relationship visible to cloud-init and userspace. Shutdown is a no-op, and instance destruction clears the Instance <> EBS relationship anyway.
It's really sad that the EBS association still has to be handled separately. The current aws_volume_attachment
is pretty much useless for any infrastructure that needs to change regularly.
The ideal solution for me would be that aws_volume_attachment
only activates when the instance is in the shutdown state (for both attach/detach), although I realise that this is non-trivial given the symmetric nature of terraform's create/destroy process.
Perhaps the concept of aws_volume_attachment
as a separate resource is the wrong way to think about it? Perhaps a tweak to aws_instance
resource native EBS handling would be a better way to achieve this concept.
Adding our bit: we're planning to use volumes as a way to send data in through a "data transfer vm" instance and then move them to a computing cluster, thus detaching the volume from the first instance and re-attaching it to another VM. That's probably not the best "cloud-friendly" approach, but that's the easiest way to get the ball rolling. Up to now we were using the taint
trick to solve this, but after 0.7 landed that doesn't work anymore (as @LeslieCarr said). We're now pretty much blocked, as we haven't found a workaround yet, apart from logging into the machine and unmounting the volume.
@maxenglander, any ETA on when your changes might land on a release? This is now critical to us!
I do agree with the group that says OS-layer actions are outside the scope of TF; this includes quiescing IO and umounting fs. I've managed to avoid (so far) the need to move EBS around but it's a tactic that I can see value in. Along that line, I've gotten a Chef cookbook at 90% which will do all the EBS deployment work (including LVM and fs) and I expect I'll use that as a base if I need to move things around. I'm only using TF to spec the ephemerals at this point.
Now what I really need is a cookbook that manages security groups, but that's a separate TF issue.
Yes, I'm also fine with TF not messing with OS actions, but here we're talking about a bug/feature missing, I believe. It would just need to follow a simply different behaviour while calling the AWS API, not doing something OS-level.
I fully concur with those (@charity, @Gary-Armstrong) who have voiced that TF is not well-equipped to perform volume detachments because detachments have extra dependencies (such as disk mounts, running processes) which TF doesn't know about. I agree that it's generally inadvisable for TF to perform OS-level actions like quiescing IO and unmounting fs.
However, I don't think it is bad (even if it's not ideal) to allow TF to manage volume attachments explicitly (via aws_volume_attachment
), and to implicitly detach volumes by destroying the instances they are attached to. I think that this approach is compatible with the view that TF shouldn't perform volume detachments: by relying on instance destruction to detach volumes, TF effectively delegates volume detachment to AWS.
I also think that the TF model of failure is perfectly well equipped to handle problems that may crop up while using this approach. For example: if, during an instance destroy phase, a disk fails to unmount, then the instance may remain running, and the volume remains attached to it. TF sees that the instance failed to be destroyed, and stops execution so that the succeeding instance is never created, and volume re-attachment is not attempted. TF reports the failure to the user, who must retry by running terraform plan
and terraform apply
. There isn't any forced detachment, there's no disk corruption, and no un-synced state between TF and AWS.
While this approach may be less elegant and less robust than using a CM tool like Chef to handle everything EBS-related, it is, I believe, a simple, clean, and predictable solution. For users like myself who simply aren't ready to introduce a CM tool into their operations, it is also a practical solution.
@dvianello I have no idea if/when HashiCorp would incorporate my changes, unfortunately. I haven't created a PR yet, since it's not clear what HashiCorp's stance on this issue is.
I've been using my patch for a while now, which you're free to try out (at your own risk, of course) if you need a stop-gap while we wait for an official solution. I've created a release with binaries, in case that's helpful. To use it you must first add "skip_detach": true
, run plan
and apply
on any aws_volume_attachment
for which you want to enable the new behavior before trying to destroy and re-create instances.
Agree @maxenglander that TF could manage attachments as you say. Entire post is agreeable, in fact. I don't want to get off on a TF wish list, but it seems entirely reasonable to expect TF to detach and potentially preserve EBS when an instance is terminated.
I like 5c09bcc from @c4milo. I've tested it in our environment for some days now. It's the best solution for this issue. I suggest to cherry-pick that one.
how is this still unsolved after a year?
Although the volume attachment resources above might not work, we have the whole thing working a slightly different way (although its using aws) - we define an aws_instance , and an aws_ebs_volume, and no attachement information, however we tag the aws_instance with the aws_ebs_volume resource.
Then on the instance bootup, we read the tag and attach and mount the disk. On the instance shutdown the reverse (although you dont need to)
It all works fine. - change the details of the instance and everything detaches and reattaches as intended in the immutable infra way.
Sure, it would be nice to have it in terraform, but you dont need it to get the basics working.
We have also tested @c4milo's commit https://github.com/hashicorp/terraform/commit/5c09bcc1debafd895423e1e2df0c5da4930468bc on our setup and have had great results in resolving our problem. We're going keep using this patch until this hopefully gets merged.
@c4milo thank you for adding this!
I'm also hitting this issue. @c4milo: have you sent a PR with https://github.com/hashicorp/terraform/commit/5c09bcc1debafd895423e1e2df0c5da4930468bc?
I did send a https://github.com/hashicorp/terraform/pull/5364 but closed it since it isn't the ideal solution to this problem as discussed in that thread.
This is pretty much the same as #2761, I'm sure there are other places this is being tracked too... going to close this one. (The reference here will link them, too)
@mitchellh , arguably this issue has bigger "community" and should be considered main point of contact to track all dependency problems which can't be expressed using simplistic graph model TF is currently using.
I know this thread was closed in favor of #2761, but given that that issue is still open, I wanted to leave this here for anyone else still experiencing this particular issue.
I was able to set skip_destroy
to true on the volume attachment to solve this issue.
Details here: https://www.terraform.io/docs/providers/aws/r/volume_attachment.html#skip_destroy
Note: in order for it to work, I had to do the following
1) set skip_destroy
to true on the volume attachment
2) run terraform apply
3) make the other changes to the instance that caused it to be terminated/recreated (changing the AMI in my case)
4) run terraform apply
again
Leaving this here in case anyone else finds it useful.
I can't get the above workaround to do the trick using 0.10.6. Looks like whatever bug was being exploited to make this work got closed.
I'm still only provisioning ephemerals in TF.
In fact, I am specifying four of them for every instance, every time. I then have some ruby/chef that will determine how many are really there (0-4) and do the needful to partition, lvm stripe, then mount as a single ext4.
I still use Chef to config all EBS from creation to fs mount. Works great. EBS persist unless defined otherwise. Mentally assigning all volume management to the OS arena has gotten me where I want to be.
This is still an issue 26 months after the issue was first created.
@exolab, It is not. You need to use destroy-time provisioners in order to unmount the EBS volume.
Sorry if I am a bit daft. How so?
Is this what you are suggesting?
provisioner "remote-exec" {
inline = ["umount -A"]
when = "destroy"
}
Also with @mpalmer not working fix with skip_destroy
using terraform 10.6 😞
Fix with skip_destroy
does not work using terraform 11.1 😢
+1
Still an issue (and a big issue for us) in v0.11.3
Still an issue in v0.11.4
terraform v0.11.7 -- have same issue with volumeattachment when running destroy; skip_destroy = true in volume attachment resource is not helping either - destroy keeps trying. went ahead force detached from console - then tried destroy moved forward at that time. Is there default timeout for TF - script kept running destroy until I ctrl C out of it -- trying to detach ebs ovl.
On Terraform v0.11.7 I was able to get around this by creating the volume attachment with
force_detach = true
if you created it without the force detach to be true it will still fail. I had to terminate the instance, allow the edit or recreation of the volume attachment to have force detach, and then all subsequent detaches work for me.
Using force_detect = true
worked for me as well (v0.11.7).
Originally created the volume without force_detect
so I had go manually force detach in the AWS console, then delete the volume (in Terraform) and re-create (also in Terraform) before it worked.
Still an issue.
Is there any issue using force_detach
? I'm assuming that processes could still be trying to use the volume. (?) Is there a way to stop the instance prior to detaching the volume and then terminate it?
Still an issue.
Is there any issue using
force_detach
? I'm assuming that processes could still be trying to use the volume. (?) Is there a way to stop the instance prior to detaching the volume and then terminate it?
I know this issue is closed, but just as a example workaround for this for people finding this, I'll post what I've done. I have a volume I want to persist between machine rebuilds (gets rebuilt from a snapshot if deleted but otherwise persisted). What I did was grab the older instance id in TF, then a local-exec (can't use remote-exec with how direct access to the machine is gated) to use the aws cli to to shutdown the machine the volume is being detached from first before destroy and rebuild of the machine and the volume attachment:
//data source to get previous instance id for TF workaround below
data "aws_instance" "example_previous_instance" {
filter {
name = "tag:Name"
values = ["${var.example_instance_values}"]
}
}
//volume attachment
resource "aws_volume_attachment" "example_volume_attachment" {
device_name = "/dev/xvdf"
volume_id = "${aws_ebs_volume.example_volume.id}"
instance_id = "${aws_instance.example_instance.id}"
//below is a workaround for TF not detaching volumes correctly on rebuilds.
//additionally the 10 second wait is too short for detachment and force_detach is ineffective currently
//so we're using a workaround: using the AWS CLI to gracefully shutdown the previous instance before detachment and instance destruction
provisioner "local-exec" {
when = "destroy"
command = "ENV=${var.env} aws ec2 stop-instances --instance-ids ${data.aws_instance.example_previous_instance.id}"
}
}
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
This is the specific error I get from terraform:
We are building out some infrastructure in EC2 using terraform (v0.6.0). I'm currently working out our persistent storage setup. The strategy I'm planning is to have the root volume of every instance be ephemeral, and to move all persistent data to a separate EBS volume (one persistent volume per instance). We want this to be as automated as possible of course.
Here is a relevant excerpt from our terraform config:
And mount.sh:
As you can see, this:
This works fine the first time it's run. But any time we:
Terraform then tries to detach the extant volume from the instance, and this task fails every time. I believe this is because you are meant to unmount the ebs volume from inside the instance before detaching the volume. The problem is, I can't work out how to get terraform to unmount the volume inside the instance before trying to detach the volume.
It's almost like I need a provisioner to run before the resource is created, or a provisioner to run on destroy (obviously https://github.com/hashicorp/terraform/issues/386 comes to mind).
This feels like it would be a common problem for anyone working with persistent EBS volumes using terraform, but my googling hasn't really found anyone even having this problem.
Am I simply doing it wrong? I'm not worried about how I get there specifically, I just would like to be able to provision persistent EBS volumes, and then attach and detach that volume to my instances in an automated fashion.