hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.35k stars 1.75k forks source link

FR: Please extend google_compute_attached_disk to allow manipulating boot disks #10690

Open kkm000 opened 2 years ago

kkm000 commented 2 years ago

Community Note

FR Summary

Add arguments boot={true|false} and auto_delete = {true|false} to the google_compute_attached_disk resource. Document it as mindbogglingly-advanced, but please, please add it!

I'm in dire straits due to an unannounced change in either the GCE API or the (then-alpha) handling of boot disks by the GCP Deployment Manager (GDM).

Problem Exposition

I must elaborate my case very clearly for it to be convincing. Sorry, but that got to be a long and sad story.

It is very well possible that I do not know how to Terraform. It is also possible that Terraform is not the right tool for the task. But I have previously used Google DM, when it was even in Alpha, with more or less success; now, that GCP chops off feature by feature from it, I'm forced to search for a replacement. Terraform is very flexible, in the sense that you may figure a sequence of infra changes outside of Terraform, importing them to the state, and then running a plan that is not going to kill and re-create unnecessary stuff. This does not sound like a praise, but other IaC software is much more rigid. So this is a praise: albeit TF is not a Holy Grail of IaC, it is as close to (if you're an optimist) or is the least far from (otherwise) what would you ideally want. After looking at multiple option, I want to stick with it.

Most of complexity—raise your hand if you truly believe in heart that a module culminating in a “provisioner "local-exec" { command "gcloud ...." }” should require a module with 16 (sixteen, one six, do you copy, Huston?) intricately interdependent null_resources—probably comes from the limitations imposed for a good reason. But this source looks like it's author fought against TF and finally won the battle, despite solemnly burying a hundred of their most precious mates—the remaining hours of their only earthly life.

My code is going this direction. I'm crying for mercy: unlike a real battle, a reinforcement will never come. Every heroically fallen hour is gone, and this is it. And what I'm implementing is exactly a variant of the google_compute_attached_disk resource with a --boot flag via gcloud local-exec calls.

New or Affected Resource(s)

I am maintaining a computational cluster. The cluster has 3 types of nodes: login nodes, which users log on to, and two other types that I'm lumping together for the purpose of this question/FR. A login node's boot is supposed to be maintained as if it were a hardware machine, i. e. with the distro's package manager. The other nodes' boot disks are never maintained, but instead simply replaced from a new image, monthly or if any changes needed. In my architecture, boot disks contain only the system and NVIDIA drivers, but all cluster management and problem domain computation software lives on a different disk, one per cluster, mounted READ_ONLY to every node (including a formidable flock of preemptible nodes spawned by the cluster controller).

Thus, the OS software domain is separated from the problem software domain by design. A cluster may be created for a different HPC work with a different problem-domain software disk shared by all nodes (akin to the traditionally used modules software in a hardware cluster), except that in the cloud and with preemptible workers, it's more natural to deploy a separate cluster for a job type: jobs do not have to wait for days for hardware resources to become available and then configure their nodes with the right software, since there is an unlimited supply of virtual hardware, which can be configured in the right VM shape for a job.

Previously, I simply detached and killed boot disks by running GDM with a variable like '--properties=with_boot_disks=false', then recreated and attached them with '--properties=with_boot_disks=true', like (GDM ninja code snippet, abbreviated)

##=== BOOT DISK (will be torn off and shredded if !BOOT) =====================##
{%- if BOOT %}
- name: {{CLUS}}-boot-{{ROLE}}
 . . .
{%- endif %}
. . .
##=== VM INSTANCE ============================================================##
- name: {{CLUS}}-{{ROLE}}
    disks:
    ##= THE BOOT DISK ==
{%- if BOOT %}
    - deviceName: boot
      boot: true
      source: $(ref.{{CLUS}}-boot-{{ROLE}}.selfLink)
{%- endif %}

This is a part of GDM that broke, and prompted me to search for a replacement. Note, however, that if something went wrong, the successfully completed 2-phase process is idempotent: if the disk failed to detach, it would on a successful run; if it was detached and killed successfully, the first phase, with !BOOT, was a no-op.

The breakage is possibly related to the Compute API change. In the new approach, which I traced through audit logs, GCP API simply sends an POST request for the VM, and with !BOOT the request does not contain a boot disk, which results in a cryptic API error which may be an API bug (or a fool-proofing restrictions, to which I'd object that the users would come up with a better fool) to the effect that the resulting VM configuration is inconsistent w.r.t. guestOsFeatures UEFI_COMPATIBLE and VIRTIO_SCSI_MULTIQUEUE. This makes not much sense, as these are properties only of the boot disk, in turn gained from the image it's created from, and aren't those of an instance. But pragmatically, it is what it is, and the ball to work around the bogosity that has been newly introduced by the GCP is in my court. I am going to file a bug, but I cannot wait for its resolution—that's assuming it is a bug and not a five-ton feature dropped straight down on me without any warning.

To summarize. I want to replace a boot disk of an instance with a freshly imaged one without recreating the instance.

It is not possible with the terraform-provider-google. If I place an image= into the boot block, the instance has to be recreated if I change it image source. If I create a boot disk as a separate resource and refer to it in the boot block of the instance, it is still recreated. Recreating an instance may not be a big deal, as these are expendable, but here I run into a different issue. Before reimaging, I must be absolutely sure that changes made outside of TF are refreshed back into the state. For example, the cluster controller metadata contains 3 essential items (the image to create preemptible node, the problem SW disk and, the most important, the cluster controller configuration file, as a tarred-gzipped-base64 encoded item that must be preserved across the disk replacement). Users sometimes change them outside of TF. Besides, the cluster owner must be able to do the reimaging. TF is designed more as a tool for an IT person than a IT-inane number cruncher, so I have to write super-careful scripts. In this case, analysing the proposed -refresh-only plan would be too complex. Yes I can read it in as JSON, but cannot write a reliable analyser that would aye or nay to the refresh, handling all corner cases. Users mangle their setups, so there could be changes that must be remembered in the TF state, and those that must be reverted based on the state. These must be untangled, and I do not want to write a program that deals with this concoction of tightballed spaghetti with a superglue sauce.

Potential Terraform Configuration

What looks to this naive me as a simple change would have almost solved my problem in almost one line: new argument of the google_compute_attached_disk resource, boot = true (the "almost" part is the request 2 below, for the auto_delete argument). Yes, it's an advanced usage, but the whole google_compute_attached_disk resource is documented as advanced, and it's documentation full of yellow warning boxes. Put more warnings the user to the user that this is advanced property of an advanced resource, but pray, please, please, please add it! It's already in the GCE API.

Add two arguments to the google_compute_attached_disk resource:

  1. boot={true|false} . Optional, default is, naturally, false.
  2. auto_delete = {true|false}. Optional. Probably allowed even if boot= is not provided (GCE allows auto_delete on non-boot disks). The default for the special case boot = true only is up to your judgment. The GCE's own default is false in all cases, and it seems uncontroversial for non-boot disk attachment. One possibility is deviate from GCE and default this to true if and only if boot = true, but it may be contrary to the provider's design philosophy, so this is only a weak suggestion.
    • The argument for the default of false is that this is the GCE API default in all cases.
    • The argument for the default of true in this special case is, which-either way an instance acquire it's boot disk, be it via the boot_disk block or via the google_compute_attached_disk resource with boot = true, the attached disk property auto_delete would be consistently true by default.

A modified example from The Example Usage section of the documentation

resource "google_compute_attached_disk" "control_boot_attach" {
  disk        = google_compute_disk.control_boot.id
  instance    = google_compute_instance.control.id
  device_name = "boot"
  boot        = true
  auto_delete = true
}

resource "google_compute_disk" "control_boot" {
  name   = "${var.CLUSTER}-control-boot"
  image  = var.OS_IMAGE
}

resource "google_google_compute_instance" "control" {
  . . .
  boot_disk {
    source      = google_compute_disk.control_boot.id
  }

  lifecycle = {
    ignore = [ boot_disk ]
  }
}

It seems to me that by a plan with -replace google_compute_disk.control_boot all the magic should happen. Actually, I even tested that, and it so tantalizingly near from working (as long as the boot disk is not named "boot"—I had to rename it "sandal" for the test), except the provider reattaches the newly reimaged replacement disk with boot =false, quite naturally; the debug showed a POST request to the https://compute.googleapis.com/compute/.../attachDisk endpoint with the payload of just two arguments { "deviceName":"sandal", "source":"zones/us-east1-c/disks/yes-control-boot" } . Detaching and manually attaching it with gcloud as a --boot disk worked just fine, and TF was even unsurprised (nothing but the minor discrepancies, which have tracking issues here, and which I usually clean up by refreshing).

Additional Notes

The GCP API does support detaching and attaching the boot disk

...provided the instance is TERMINATED.

The =>> mark indicates command output, for easier reading.

$  export project=[REDACTED]
$ gcloud beta compute instances detach-disk yes-control --disk=yes-control-boot
=>> [Updated .../instance/yes-control]
$ gcloud beta compute instances describe yes-control  --format='yaml(disks.filter("boot:true"))'
=>> disks: []
$ curl "https://compute.googleapis.com/compute/beta/projects/$project/zones/us-east1-c/instances/yes-control/attachDisk?\
access_token=$(gcloud auth print-access-token)" -HContent-Type:application/json\
  -d '{"autoDelete":true,"boot":true,"deviceName":"boot","source":"zones/us-east1-c/disks/yes-control-boot"}'
=>> {
  "id": "1560829451153999956",
  "name": "operation-1638741690604-5d26d4858dc80-2731d480-898b4fec",
 . . .
$ sleep 5
$ gcloud beta compute instances describe yes-control  --format='yaml(disks.filter("boot:true"))'
=>>    [some irrelevant fields omitted for brevity]
disks:
- autoDelete: true
  boot: true
  deviceName: boot
  guestOsFeatures:
  - type: UEFI_COMPATIBLE
  - type: VIRTIO_SCSI_MULTIQUEUE
  index: 0
  mode: READ_WRITE
  type: PERSISTENT

Note that the boot disk is always lifted up to the index 0, so that detaching and reattaching a boot disk does not shuffle the disk order.

A fool-proofing help from GCE API: name boot is reserved

Apparently, the device_name = boot is special, although I don't recall seeing it documented anywhere. If device_name is "boot" but boot is not true (false or omitted), the GCP API operation is accepted, but completes with an error:

$ gcloud beta compute instances detach-disk yes-control --disk=yes-control-boot
=>> [Updated .../instance/yes-control]
$ curl "https://compute.googleapis.com/compute/beta/projects/$project/zones/us-east1-c/instances/yes-control/attachDisk?\
access_token=$(gcloud auth print-access-token)" -HContent-Type:application/json\
  -d '{"deviceName":"boot","source":"zones/us-east1-c/disks/yes-control-boot"}'
=>> {
  "id": "1371992864568935587",
  "name": "operation-1638742603889-5d26d7ec87d3e-2d2973d3-80910c45",
  "zone": ".../us-east1-c"
 . . . .
}
$ sleep 5  # Give it a few seconds if running interactively
$ gcloud beta compute instances describe yes-control  --format='yaml(disks.filter("boot:true"))'
=>> disks: []
$ gcloud beta compute operations describe --zone=us-east1-c operation-1638742603889-5d26d7ec87d3e-2d2973d3-80910c45
=>>
endTime: '2021-12-05T14:16:45.547-08:00'
error:
  errors:
  - code: INVALID_USAGE
    message: 'Invalid device name: boot'
httpErrorMessage: BAD REQUEST
httpErrorStatusCode: 400

References

Looking; will add.

b/322601800

kkm000 commented 2 years ago

@rileykarson, thank you so much for putting this into the dev timeline, that's super! I think that I can write this change by myself, it's quite a simple addition. I have a question about the right way to contribute, tho. I tried to absorb the contrib guides in this and the terraform-provider-google-beta repos, and that in the Magic Modules. The principal distinction is the one between autogenerated and handwritten resources. The attached_disk resource is handwritten, and will probably stay that way for a while, since there is no CRUD counterpart in the GCE API. Also, both the GCE API and the change are identical for compute/v1 and compute/beta.

In my understanding, for such a case the correct sequence of steps is:

  1. I send the changes in a PR to this repo, for google/resource_compute_attached_disk.go and google/resource_compute_attached_disk_test.go and compute_attached_disk.html.markdown. It's tested in CI, reviewed, but not merged.
  2. When LGTM'd, I transfer the changes via a PR into https://github.com/GoogleCloudPlatform/magic-modules/, to mmv1/third_party/terraform{resources,tests,website} — but I'm a bit confused here: I do, or some of your automation does?
  3. From there, after basically rubberstamping, the change is automagically merged into both this and the beta provider repos.

Did I get the process right?

rileykarson commented 2 years ago

Rather than making the changes here first, I'd recommend making them against https://github.com/GoogleCloudPlatform/magic-modules/ from the onset. We're able to upstream changes to that repo, but it's less work overall if they start there in the first place.

kkm000 commented 2 years ago

@rileykarson, I admit I don't grok how PRs go through the CI test pipelines here when they originate in the MM repo (they has to be committed there before getting here, where the tests are run), but I'll do as you prefer, naturally! As I think of it, something has to fly above my head anyway, otherwise the Modules wouldn't be called Magic in the first place... :))

rileykarson commented 2 years ago

This repo only runs unit tests as a presubmit, where there's ~100 unit tests and 1800 acceptance tests in the repo. In the MM repo we run the unit tests and acceptance tests based on the change that will get generated by your changes to the MM repo using a mixture of a few different systems. The downside is that it's low visibility for contributors, unfortunately (it'd be too easy to extradite credentials otherwise) so you're a little more reliant on your reviewer to share results than would be ideal.

ScottSuarez commented 9 months ago

To the service team:

To put this ticket succinctly, as explained in it's summary. The request to add the fields boot={true|false} and auto_delete = {true|false} to google_compute_attached_disk.