hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

Support for disk attachment to VMs at creation time #6117

Open mal opened 4 years ago

mal commented 4 years ago

Community Note

Description

Azure allows VMs to be booted with managed data disks pre-attached/attached-on-boot. This enables use cases where cloud-init and/or other "on-launch" configuration management tooling is able to prepare them for use as part of the initialisation process.

This provider currently only supports this case for individual VMs with the older, deprecated azurerm_virtual_machine resource. The new azurerm_linux_virtual_machine and azurerm_windows_virtual_machine resources instead opt to push users towards the separate azurerm_virtual_machine_data_disk_attachment which only attaches data disks to an existing VM post-boot, which fails to service the use case laid out above.

This is in contrast to the respective *_scale_set providers which (albeit out of necessity) support this behaviour.

Please could a repeatable data_disk block be added to the new VM resources (analogous to the same block in their scale_set counterparts) in order to allow VMs to be started with managed data disks pre-attached.

Thanks! 😁

New or Affected Resource(s)

Potential Terraform Configuration

resource "azurerm_linux_virtual_machine" "example" {
  [...]

  os_disk {
    name                 = "example-os"
    caching              = "ReadWrite"
    storage_account_type = "StandardSSD_LRS"
  }

  data_disk {
    name                 = "example-data"
    caching              = "ReadWrite"
    disk_size_gb         = 4096
    lun                  = 0
    storage_account_type = "StandardSSD_LRS"
  }

  [...]
}

References

rgl commented 4 years ago

azurerm_virtual_machine_data_disk_attachment which only attaches data disks to an existing VM post-boot

Oh, that is really unfortunate... I wish I could try this but I'm not even able to create a managed disk due to https://github.com/terraform-providers/terraform-provider-azurerm/issues/6029

lanrongwen commented 4 years ago

If I'm following this thread correctly (as we are still using the legacy disk system and were looking to move over) can you not deploy VMs with disks already attached? Is it truly rebooting VMs for each disk (thread in #6314 above)? This feels like a HUGE step backwards especially if the legacy mode we are using is being deprecated.

lightdrive commented 4 years ago

Also how do you deploy and configure a data disk that is in the source reference image if the data disk block is no longer valid?

rgl commented 4 years ago

@lightdrive, I've worked around it by using ansible at https://github.com/rgl/terraform-ansible-azure-vagrant

scott1138 commented 4 years ago

This is something I just ran across as well, I'd like to be able to use cloud-init to configure the disks. Any news on a resolution?

jackofallops commented 4 years ago

This item is next on my list, no ETA yet though sorry. I'll link it to a milestone when I've had chance to size and scope it.

ilons commented 3 years ago

It seems that the work done by @jackofallops have been closed with a note that it needs to be implemented in a different way.

Does anyone have a possible work-around for this?

My use-case are like others have pointed out:

Writing my own scripts to make this instead of using cloud-init seems like a waste. Using the workaround mentioned in https://github.com/terraform-providers/terraform-provider-azurerm/issues/6074#issuecomment-626523919 might be possible, but seems to hacky indeed, and require some large changes to how resources are created.

mal commented 3 years ago

Alas, was really looking forward to an official fix for this. πŸ™

In lieu of that however, here's what I came up with about six months ago having had no option but to make this work at minimum for newly booted VMs (note: this has not been tested with changes to, or replacements of the disks - literally just booting new VMs). I'm also not really a Go person, and as a result this is definitely a hack and nothing even approaching a "good" solution, much less sane contents for a PR. Given that be warned that whatever state is generated is almost certainly destined to be incompatible with whatever shape the official implementation yields should it ever land, but on the off chance it does prove useful in some capacity or simply the embers to spark someone else's imagination, here's the horrible change I made to allow for booting VMs with disk attached such that cloud-init could run correctly: https://github.com/terraform-providers/terraform-provider-azurerm/commit/6e19897658bb5b79418231ca1c004fde83698b40.

Usage: ```terraform resource "azurerm_linux_virtual_machine" "example" { [...] data_disk { name = "example-data" caching = "ReadWrite" disk_size_gb = 320 lun = 0 storage_account_type = "StandardSSD_LRS" } [...] } ```
tombuildsstuff commented 3 years ago

@mal FWIW this is being worked on, however the edge-cases make this more complicated than it appears - in particular we're trying to avoid several limitations from the older VM resources, which is why this isn't being lifted over 1:1 and is taking longer here.

mal commented 3 years ago

Thanks for the insight @tombuildsstuff, great to know it's still being actively worked on. I put that commit out there in response to the request for possible work-arounds in case it was useful to someone that finds themself in the position I was in previously, where waiting for something to cover all the cases wasn't an option. Please don't take that as any kind of slight or indictment of the ongoing efforts, I definitely support any official solution covering all the cases, in my case it just wasn't possible to wait for it, but I'll be first in line to move definitions over to it when it does land. 😁

alec-pinson commented 3 years ago

incase this helps anyone else... main part to note is the top line waiting for 3 disks before trying to format them etc

write_files:
  - content: |
      # Wait for x disks to be available
      while [ `ls -l /dev/disk/azure/scsi1 | grep lun | wc -l` -lt 3 ]; do echo waiting on disks...; sleep 5; done

      DISK=$1
      DISK_PARTITION=$DISK"-part1"
      VG=$2
      VOL=$3
      MOUNTPOINT=$4
      # Partition disk
      sed -e 's/\s*\([\+0-9a-zA-Z]*\).*/\1/' << EOF | fdisk $DISK
        n # new partition
        p # primary partition
        1 # partition number 1
          # default - start at beginning of disk
          # default - end of the disk
        w # write the partition table
        q # and we're done
      EOF

      # Create physical volume
      pvcreate $DISK_PARTITION

      # Create volume group
      if [[ -z `vgs | grep $VG` ]]; then
        vgcreate $VG $DISK_PARTITION
      else
        vgextend $VG $DISK_PARTITION
      fi

      # Create logical volume
      if [[ -z $SIZE ]]; then
        SIZE="100%FREE"
      fi

      lvcreate -l $SIZE -n $VOL $VG

      # Create filesystem
      mkfs.ext3 -m 0 /dev/$VG/$VOL

      # Add to fstab
      echo "/dev/$VG/$VOL   $MOUNTPOINT     ext3    defaults        0       2" >> /etc/fstab

      # Create mount point
      mkdir -p $MOUNTPOINT

      # Mount
      mount $MOUNTPOINT
    path: /run/create_fs.sh
    permissions: '0700'

runcmd:
  - /run/create_fs.sh /dev/disk/azure/scsi1/lun1 vg00 vol1 /oracle
  - /run/create_fs.sh /dev/disk/azure/scsi1/lun2 vg00 vol2 /oracle/diag
ruandersMSFT commented 3 years ago

Simple use case that needs to work. Azure Image has OS Disk and a Data Disk. VM (Linux or Windows) to now be provisioned from the Azure Image. @tombuildsstuff Let's not mark this as off topic again. Data Disks properties need to be configurable at creation. This is blocking too many use cases not being implemented.

tombuildsstuff commented 3 years ago

@ruandersMSFT that's what this issue is tracking - you can find the latest update here

As per the community note above: Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request - which is why comments are marked as off-topic - we ask instead that users add a πŸ‘ to the issue.

carlosdoliveira commented 3 years ago

In this case should we instead use deprecated azurerm_virtual_machine resource?

danielrichdc commented 3 years ago

Is there an ETA for this? The previous SSH AAD login for linux extension has been deprecated and the new one requires assigning a system managed identity, which requires use of the identity block that azurerm_virtual_machine doesn't support.. (And since azurerm_linux_virtual_machine doesn't support adding/attaching data/storage disks we are stuck)

Is there another way via terraform to add a system managed identity that doesn't involve using a local-exec provisioner to call az cli using azurerm_virtual_machine?

redeux commented 3 years ago

Depends on https://github.com/hashicorp/terraform-plugin-sdk/issues/220

agehrig commented 2 years ago

As a workaround to use azurerm_linux_virtual_machine, the following cloud-init snippet waits at the bootcmd stage until the data disk is available:

bootcmd:
  - until [ -e /dev/disk/azure/scsi1/lun0 ]; do sleep 1; done
disk_setup:
  /dev/disk/azure/scsi1/lun0:
    table_type: gpt
    layout: True
    overwrite: False
fs_setup:
  - device: /dev/disk/azure/scsi1/lun0
    partition: 1
    filesystem: ext4
    overwrite: False
mounts:
  - [/dev/disk/azure/scsi1/lun0-part1, /data]
growpart:
  mode: auto
  devices:
    - /
    - /dev/disk/azure/scsi1/lun0-part1
  ignore_growroot_disabled: true
write_files:
  - content: |
      #!/bin/sh
      resize2fs -f /dev/disk/azure/scsi1/lun0-part1
    path: /var/lib/cloud/scripts/per-boot/resize2fs.sh
    permissions: 0755

The growpart and write_files is optional to resize the partition and disk on every boot.

darrens280 commented 2 years ago

NOTE --> Terraform resource azurerm_windows_virtual_machine_scale_set supports deploying from a gallery image version (which includes a data disk). I have this working successfully, however deploying a standard VM with azurerm_windows_virtual_machine resource using the same gallery image version as the source does not support the data disks weirdly....

jeremybusk commented 2 years ago

Any update on this feature?

Hanse00 commented 2 years ago

@redeux You indicated this is blocked by https://github.com/hashicorp/terraform-plugin-sdk/issues/220, but per the latest comments on that issue, it is being closed with a decision to not implement the proposed changes.

Given that, can you provide any additional information on what the future is of this issue? Being blocked by something that isn’t going to happen seems like a dead end.

ravensorb commented 2 years ago

Is there an update on this? It is a blocker from using a number of custom images that depend on 1 or more data disks

andyliddle commented 1 year ago

Any progress?

looking to deploy Sophos XG VM in Azure through terraform and this creates data disk from image.

michaelbaptist commented 1 year ago

Any traction on this? Do we need to add development resources? This has been asked for, for years now.

Better yet, can we just backport the features to azurerm_virtual_ machine and not deprecate it? I'd like to use the user_data field with the old model that should be a trivial feature addition. Due to that not being a field supported in the old model I have to use this new model, which doesn't allow for drive attachment pre-VM creation/boot.

Booting a VM and hot swapping drives via drive attach is a major regression.

TychonautVII commented 1 year ago

I think a hacky work around is that azure deployment templates are able to deploy a VM and attach a disk at creation. So you can:

(1) make an azure deployment template for the VM you need. (It's easy to do this in the Azure console by manually configuring the VM and clicking the "Download template for automation" button (2) Deploy that template using "azurerm_resource_group_template_deployment" (or outside of terraform).

I'd much rather have the terraform resource support this, but I think something like this might be a stopgap. I'm trying to get this integrated into our process now and it's working so far.

jeffwmiles commented 1 year ago

I expect this will be marked off-topic, but after nearly 3 years since open, this issue needs more attention.

The AzureRM provider has put its users in a bad place here. There are critical features of Azure that are now inaccessible as mentioned by others in this thread. Because my shared OS image has data disks, I cannot use dedicated hosts, cloud-init, or proper identity support for my virtual machine, and this list will only continue to grow because the cloud never stops moving.

How can we as a community help here? There is clearly a lot of development effort going into this provider, judging by the changelog and rate of pull requests; can we raise the priority of this issue?

There is certainly an opportunity for more transparency on why this hasn't moved and other items are getting development attention.

michaelbaptist commented 1 year ago

If there is a clean way to migrate from virtual_machine to deployment template, I can live with that, but current terraform will try to do unexpected things due to how they've implemented deployment templates as well.

GraemeMeyerGT commented 1 year ago

@jackofallops it's hard to tell in this thread, but looks like you may have added this to the "blocked" milestone. It's no longer clear in the thread what is blocking this issue. Can you clarify? We are seeing a lot of activity in this thread and it's the third-most πŸ‘ issue.

TheBlackMini commented 1 year ago

What is the state of this issue? Is it blocked?

It's currently 3 years old and we still can't build a VM from a template which has data disks?

kvietmeier commented 1 year ago

The azurerm_linux_virtualmachine docs include "storage_data_disk" as a valid block but terraform plan errors out claiming it is unsupported. I tried a dynamic block and a standard block - with a precreated disk to "attach" and "empty" with no disk created - all failed.

When I've seen this error before it was either a syntax error or a no longer supported block type.

Is this a documentation bug?

Versions:

KV C:\Users\ksvietme\repos\Terraform\azure\VMs\linuxvm_2> terraform version
Terraform v1.4.6
on windows_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.57.0
+ provider registry.terraform.io/hashicorp/random v3.5.1
+ provider registry.terraform.io/hashicorp/template v2.2.0

Error:

β•·
β”‚ Error: Unsupported block type
β”‚
β”‚   on linuxvm_2.main.tf line 162, in resource "azurerm_linux_virtual_machine" "linuxvm01":
β”‚  162:  dynamic "storage_data_disk" {
β”‚
β”‚ Blocks of type "storage_data_disk" are not expected here.
β•΅

Disk creation (works)


resource "azurerm_managed_disk" "lun1" {
  name                 = "lun17865"
  location                        = azurerm_resource_group.linuxvm_rg.location
  resource_group_name             = azurerm_resource_group.linuxvm_rg.name
  storage_account_type = "Standard_LRS"
  create_option        = "Empty"
  disk_size_gb         = "100"

  tags = {
    environment = "staging"
  }
}

Call to storage_data_disk:

resource "azurerm_linux_virtual_machine" "linuxvm01" {
  location                        = azurerm_resource_group.linuxvm_rg.location
  resource_group_name             = azurerm_resource_group.linuxvm_rg.name
  size                            = var.vm_size

  # Make sure hostname matches public IP DNS name
  name          = var.vm_name
  computer_name = var.vm_name

  # Attach NICs (created in linuxvm_2.network)
  network_interface_ids = [
    azurerm_network_interface.primary.id,
  ]

  # Reference the cloud-init file rendered earlier
  # for post bringup configuration
  custom_data = data.template_cloudinit_config.config.rendered

  ###--- Admin user
  admin_username = var.username
  admin_password = var.password
  disable_password_authentication = false

  admin_ssh_key {
    username   = var.username
    public_key = file(var.ssh_key)
  }

 ###--- End Admin User
 dynamic "storage_data_disk" {
    content {
    name = azurerm_managed_disk.lun1.name
    managed_disk_id   = azurerm_managed_disk.lun1.id
    disk_size_gb = azurerm_managed_disk.lun1.disk_size_gb
    caching = "ReadWrite"
    create_option = "Attach"
    lun = 1
    }
  }

  ### Image and OS configuration
  source_image_reference {
    publisher = var.publisher
    offer     = var.offer
    sku       = var.sku
    version   = var.ver
  }

  os_disk {
    name                 = var.vm_name
    caching              = var.caching
    storage_account_type = var.sa_type
  }

  # For serial console and monitoring
  boot_diagnostics {
    storage_account_uri = azurerm_storage_account.diagstorageaccount.primary_blob_endpoint
  }

  tags = {
    # Enable/Disable hyperthreading (requires support ticket to enable feature)
    "platformsettings.host_environment.disablehyperthreading" = "false"
  }

}
###--- End VM Creation

Thanks. I'm sure I'm missing something here.

matteus8 commented 12 months ago

So is this not possible, and if not now, will this be possible in the future as the azurerm_virtual_machine becomes depreciated.

`resource "azurerm_linux_virtual_machine" "example_name" { name = "${var.lin_machine_name}"

...

source_image_id = "/subscriptions/XXXXXXXXX/resourceGroups/example_RG/Microsoft.Compute/galleries/example_gallary/images/example_image/versions/0.0.x"

os_disk { name = "lin_name" caching = "ReadWrite" storage_account_type = "StandardSSD_LRS" }

depends_on = [

...

] }`

###################### essentially my 'source_image_id' has a snapshot of an image with 2 data disks attached. However when doing a 'terraform apply' I will get the following error... "Original Error: Code="InvalidParameter" Message="StorageProfile.dataDisks.lun does not have required value(s) for image specified in storage profile." Target="storageProfile"" ######################

I have tried using the "data_disk" option, but this is not supported as stated above.

` data_disks { lun = 0 create_option = "FromImage" disk_size_gb = 1024 caching = "None" storage_account_type = "Premium_LRS" }

data_disks { lun = 1 create_option = "FromImage" disk_size_gb = 512 caching = "None" storage_account_type = "Premium_LRS" }`

Are there any other suggestions, or will this be included in terraform in the near future?

shaneholder commented 11 months ago

I feel I must be missing something here as my scenario seems like it would be so common that this issue would need to have been addressed much sooner.

I am trying to use Packer to build CIS/STIG compliant VMs for GoldenImages. Part of the spec has several folders that need to go onto non root partitions. To achieve this I added a drive added the partitions and moved data around. We also use LVM in order to achieve availability requirements if a partition gets full. I used az cli to boot the VM and I was also able to add an additional data drive using the --data-disk-sizes-gb option so I know the control plane will handle it.

When I try to use the VM with Terraform I get the storageAccount error mentioned above. Is there really no viable workaround for building golden images with multiple disks and using TF to create the VM's?

djryanj commented 11 months ago

@shaneholder for now, the generally accepted workaround (which I have used successfully) is to use a secondary azurerm_virtual_machine_data_disk_attachment resource to attach the disk, and the cloud-init script recommended by @agehrig in this comment.

It would be great to hear from the developers as to exactly why this is still blocked, since it's unclear to everyone here especially given the popularity of the request.

shaneholder commented 11 months ago

@djryanj thanks for the reply. I'm trying to understand it in the context of my problem though. The image in the gallery already has 2 disks, 1 os and 1 data, and right now i'm not trying to add another disk but that would be the next logical step. The issue I'm having is that I can't even get to the point where the VM has been created.

I ran TF with a trace and found the PUT command that creates the VM and what I believe is happening is that TF seems to be incorrectly adding a "dataDisks": [] element to the JSON sent in the PUT request. If I take the JSON data for the PUT and remove that element and then run the PUT command manually the VM is created with 2 disks as expected.

djryanj commented 11 months ago

@shaneholder ah I understand. If the gallery image has 2 disks and is not deployable via Terraform using the azurerm_linux_virtual_machine resource because of that, I don't think it's solvable using the workaround I suggested and I'm afraid I don't know what to suggest other than moving back to an azurerm_virtual_machine resource, or getting a working ARM template for the deployment and using something like a azurerm_resource_group_template_deployment resource to deploy that from the working template, which is awful, but would work.

@tombuildsstuff - I'm sure you can see the activity here. Any input?

shaneholder commented 11 months ago

A little more information. I just ran the same TF but used a VM image that does not have a data disk built in. That PUT request also has the "dataDisks": [] element in the JSON but instead of failing it succeeds and builds the VM. So it seems that if a VM image has an existing data disk and the dataDisks element is passed in the JSON then the VM build will fail, however if the VM Image does not have a data disk then the dataDisks element can be sent and the VM will build.

shaneholder commented 11 months ago

Another piece to the puzzle. I set the logging option for az cli and noticed that it adds the following dataDisks element when I specify additional disks. The lun:0 object is the disk that is built into the image. If I run similar code in TF the dataDisks property is an empty array rather than an array that includes the dataDiskImages from the VM Image Version combined with the additional disks I asked to be attached.

"dataDisks": [
                {
                  "lun": 0,
                  "managedDisk": {
                    "storageAccountType": null
                  },
                  "createOption": "fromImage"
                },
                {
                  "lun": 1,
                  "managedDisk": {
                    "storageAccountType": null
                  },
                  "createOption": "empty",
                  "diskSizeGB": 30
                },
                {
                  "lun": 2,
                  "managedDisk": {
                    "storageAccountType": null
                  },
                  "createOption": "empty",
                  "diskSizeGB": 35
                }
              ]
shaneholder commented 11 months ago

Alright, so I cloned the repo and fiddled around a bit. I hacked the linux_virtual_machine_resource.go file around line 512. I changed:

                DataDisks: &[]compute.DataDisk{},

to:

                DataDisks: &[]compute.DataDisk{
                    {
                        Lun:          utils.Int32(0),
                        CreateOption: compute.DiskCreateOptionTypesFromImage,
                        ManagedDisk:  &compute.ManagedDiskParameters{},
                    },
                },

And I was able to build my VM with the two drives that are declared in the image in our gallery. Additionally I was also able to add a third disk using the azurerm_managed_disk/azurerm_virtual_machine_data_disk_attachment.

I was trying to determine how to find the dataDiskImages from the image in the gallery but I've not been able to suss that out yet. It seems that what needs to be done is the code should pull the dataDiskImages property and do a similar conversion as it does with the osDisk.

Hoping that @tombuildsstuff can help me out then maybe I can PR a change?

shaneholder commented 11 months ago

Ok, so on a hunch I completely commented out the DataDisks property and ran it again and it worked, I created a VM with both the included image data drive AND an attached drive.

tombuildsstuff commented 10 months ago

πŸ‘‹ hey folks

To give an update on this one, unfortunately this issue is still blocked due to a combination of the behaviour of the Azure API (specifically the CreateOption field) and limitations of the Terraform Plugin SDK.

We've spent a considerable amount of time trying to solve this; however given the number of use-cases for disks, every technical solution possible using the Terraform Plugin SDK has hit a wall for some subset of users which means that Terraform Plugin Framework is required to solve this. Unfortunately this requires bumping the version of the Terraform Protocol being used - which is going to bump the minimum required version of Terraform.

Although bumping the minimum version of Terraform is something that we've had scheduled for 4.0 for a long time - unfortunately that migration in a codebase this size is non-trivial, due to the design of Terraform Plugin Framework being substantially different to the Terraform Plugin SDK, which (amongst other things) requires breaking configuration changes.

Whilst porting over the existing data_disks implementation seems a reasonable solution, unfortunately the existing implementation is problematic enough that we'd need to introduce further breaking changes to fix this properly once we go to Terraform Plugin Framework. In the interim the way to attach Data Disks to a Virtual Machine is by using the azurerm_virtual_machine_data_disk_attachment resource.

Moving forward we plan to open a Meta Issue tracking Terraform Plugin Framework in the not-too-distant future, however there's a number of items that we need to resolve before doing so.

We understand that's disheartening to hear, we're trying to unblock this (and several other) of the larger issues - but equally we don't want to give folks false-hope that this is a quick win when doing so would cause larger issues.

Given the amount of activity on this thread - I'm going to temporarily lock this issue for the moment to avoid setting incorrect expectations - but we'll post an update as soon as we can.


To reiterate/TL;DR: adding support for Terraform Plugin Framework is a high priority for us and will unblock work on this feature request. We plan to open a Meta Issue for that in the not-too-distant future - which we'll post an update about here when that becomes available.

Thank you all for your input, please bear with us - and we'll post an update as soon as we can.