vmware / terraform-provider-vcd

Terraform VMware Cloud Director provider
https://www.terraform.io/docs/providers/vcd/
Mozilla Public License 2.0
147 stars 112 forks source link

Enhance Terraform vcd provider API support to allow vApp creation using vApp templates #683

Open alexperezau opened 3 years ago

alexperezau commented 3 years ago

Background

We'd like to spin up a large number of VMs using terraform as our IaC tool. Using the official VMware vcd provisioner (https://registry.terraform.io/providers/vmware/vcd/latest/docs), we've discovered that VMs in a vApp are spun up sequentially (see issue #579 ). The time to create VMs in the vApp scales linearly. This is the designed vCD behaviour apparently. So vApp provisioning in terraform takes too long based on the number of VMs being deployed. The fastest UI method found was to create a vApp from a vApp template. But vcd currently does not provide any way to utilise this functionality.

Impact on deployment timing

Creation of vApps with terraform with a sizeable number of VMs takes far too long to provision.

Timing we saw locally (via terraform, the third case is a worry):

    1 vApp with 1 VM - 1m26s
    10 vApps with 1 VM per vApp - 3m3s
    1 vApp with 8 VMs per vApp - 14m31s

Workaround solution

A workaround solution was tested (loosely, it needs terraform imports statements to model the infra) using the Powershell VMWare vCD API "New-CIVApp" (called with local-exec via Terraform):

    100 vApps with 4 VMs per vApp (400 VMs) - 19m11s

See Example 2: https://developer.vmware.com/docs/powercli/latest/vmware.vimautomation.cloud/commands/new-civapp/#CreateEmptyVapp

...
variable "vAppName_vAppTemplate" {
  type = map(string)
  default = {
        student1 = "Lab-Template", 
        student2 = "Lab-Template",
...
resource "null_resource" "powershell_vapp" {
  for_each = var.vAppName_vAppTemplate
  provisioner "local-exec" {
    command = "connect-ciserver example.server -Org '${var.org}' -User '${var.user}' -Password '${var.password}' ; New-CIVApp -Name ${each.key} -OrgVdc '${var.vdc}' -VAppTemplate ${each.value} ; Start-CIVApp ${each.key} | out-null"
    interpreter = ["/usr/local/microsoft/powershell/7/pwsh", "-Command"]
  }
}
...

Feature Request

Terraform is working as expected (10 parallel calls by default), but vCD limits the VM creation to a sequential VM build in a vApp. Other VMWare APIs exist to allow creation of vApps from vApp templates (e.g. "New-CIVApp"), and we believe this API should be exposed by the vcd provider in terraform also to alleviate this issue.

Terraform Version

Terraform v0.15.5 on darwin_amd64

Affected Resource(s)

Please list the resources as a list, for example:

Terraform Configuration Files (1 vApp, 8 VMs approx 14m31s to create)

...
resource "vcd_vapp" "ops" {
    name = "ops"
}

resource "vcd_vapp_vm" "machine" {
    count = 8
    vapp_name = vcd_vapp.ops.name
    name = "machine-${count.index + 1}"
...

Debug Output

N/A

Expected Behavior

The Terraform Virtual Cloud provider should expose the API that allows vApp creation from vApp templates.

Actual Behavior

Terraform Virtual Cloud provider provides standard vApp creation with sequential VM creation per vApp.

Steps to Reproduce

  1. Terraform configuration that creates >1 VM in a vApp.
  2. Scale this to hundreds of vApps with multiple VMs.
  3. Deployment duration depends on the number of VMs per vApp (sequential creation slows the process).

Important Factoids

N/A

References

dataclouder commented 3 years ago

Hi, Thanks for reporting this issue. We have considered this problem several times. A few months ago we developed a partial solution: standalone VMs which allow parallel creation of VMs. I know you mention vApps, but before we start discussing a new feature I need to ask whether the standalone VMs can be a solution.

alexperezau commented 3 years ago

Hi, Thanks for reporting this issue. We have considered this problem several times. A few months ago we developed a partial solution: standalone VMs which allow parallel creation of VMs. I know you mention vApps, but before we start discussing a new feature I need to ask whether the standalone VMs can be a solution.

Standalone VMs create a "hidden" vApp, as per the linked documentation in the vcd provider. In vCloud Director release 9.5, that would limit the number of these terraform created VMs to 5000 per organisation (max vApps). Each vApp should handle up to 128 VMs, so your proposed answer does not scale (i.e. the. customer would be VM limited).

How much effort would you estimate in adding the additional CRUD functionality to the provider to support this functionality?

alexperezau commented 3 years ago

Note, we have vCloud Director 10.2, and the vApp limits are the same.

dataclouder commented 3 years ago

Let's keep aside the theoretical limits, because we are not going to reach them within Terraform constraints. We need to strike a compromise between efficiency and maintainability. That's why I asked if standalone VMs can be considered. The practical questions are:

  1. Do you need vApps for this large number of VMs?
  2. How many vApps/VMs do you realistically need?

Regarding effort, there are two approaches that would allow us to deploy VMs in parallel within vApps:

Given the combined constraints of the VCD API and Terraform infrastructure, it is important to look for solutions that are practically attainable. Hence, I need to know what numbers are realistic in this case. We may not be able to reach the theoretical limits, but we might find intermediate solutions that can alleviate the problem.

alexperezau commented 3 years ago

Appreciate the discussion here Giuseppe. I don't want to waste time or be too political.

I'm basing this request off a very recent customer ask to be able to spin up to 175 vApps with 4-8 VMs per vApp. This would satisfy a remote learning class scenario with lots of students needing multiple machines, so timing is critical. They have quoted increases (assume year on year) as their customer base grows. Sales have discussed 1200+ vApps being needed in the future.

The point here is to use Terraform to fully manage the deployment. The single vApp scenario you mention with hundreds/thousands of VMs is not desired. Granularity is needed re mgmt, resources and networking. The other "do-it-yourself" approach, which I saw in another vcd issue quoted as a "workaround", where customers are expected to launch a single vApp+VM, run some UI actions, then change the terraform config to adjust the vApp being pointed to by the VM (to be able to keep Terraform in use). This is not a timely or workable solution and puts more overhead on the customer, especially as they know the vApp template functionality exists in VCD already.

To summarise: The vApp creation via vApp templates is existing functionality that lives in vCloud Director and exposed in an API already. We would regard this as a highly desirable use case (large VM deployments in many vApps with industry standard IaC tooling). Let's continue this discussion because we need to be able to position VCD as the platform of choice going forward.

I note the resourcing constraints re VCD plugin and Terraform. We lack bandwidth here to make any provider changes ourselves, but are the maintainers of the plugin willing to work with remote development engineers if the opportunity arose?

dataclouder commented 3 years ago

I have experimented with several ways of parallelize VM creation, and I have reached some conclusions.

  1. The method of creating VM separately and then move them under the wanted vApp is doable, but not satisfactory: the reason is that the moving of a VM , even within the same storage profile, takes almost as long as the creation itself. Sometime it is faster than that, but overall the time saved is not more than 40% in the best of cases.
  2. We can create VMs in parallel by adding all of them to the vApp recomposition structure. The trouble in this case is that in Terraform we define the structures separately, and although the VMs are created in parallel, their information is not centralized.

I have found a way of implementing solution n.2 above, by having all VMs sending their information to a scheduler, which in turn starts the vApp recomposition when all the information is collected. It works in the lab. To make it work in the wild, i.e. in terraform-provider-vcd, we need to pay a little price: every VM must pass along the number of VMs that will be created simultaneously. It will be something along the lines of

resource "vcd_vapp_vm_v2" "test_vm" {
    name         = "test_vm"
    vapp_id      = data.vcd_vapp_v2.my-vapp.id
    template_id  = data.vcd_catalogitem.my_template.id
        parallel_vms = 3
}

There are still some details to figure out, but this seems the most promising way.

It would help me if I had some information about the time spent building most of the vApps. What is the average time used to build a VM? And the longest?

alexperezau commented 3 years ago

Apologies for the delay in response. I need to check if I still have access to vCloud Director here (the DevOps team work mainly with vsphere). What exactly would you like me to try in order to get these metrics back to you? I did put some VM build times in the original post.

Perhaps drop some terraform code we can modify for our infra? Terraform already produces the time metrics, unless you want some other debug/measures.

dataclouder commented 3 years ago

(Back from vacation) Sorry, I forgot that the original post had some reference timings. Still, I would like to know – from anyone interested in this feature – what is the longest time spent building a single VM. That would allow me to test accordingly.

Prasaddiwalkar commented 2 years ago

I am looking for this same feature.