Add support for data sources

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request. If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

Given minimal information to access vCenter, that is, the vCenter host, and some credentials, expose data sources that would allow Packer to discover the resources required by builders. Examples include:

hosts
clusters
datacenter
datastore
datastore items
network
content library
content library items
templates

Many vCenter installations will have dedicated resources for automation and, as such, uses one of the following ways to convey that:

Tag the resource (e.g. Packer)
Following some naming convention (packer-*)
Where supported, use custom attributes

By using discovery, we ease maintenance by

Reducing hardcoded values, be in the packer template, scripts calling packer. In some organizations information disclosure is a concern, and discovery may ease that concern.
Reducing the number of variables required
Reducing the churn involved when things change. For example, I have 40+ packer templates, and when I need to move it to another vCenter, I have to make around 200+ changes, whether it's in packer variable files, the CI/CD system calling it, or the environment.
Improving reliability

Use Case(s)

There are properties in the vsphere-iso builder that can be populated:

host
folder
iso_paths
network
datastore
content_library_destination

Each of these resources in vCenter can be tagged, follow a naming convention, and even have custom properties. In many cases, it doesn't actually matter which resources are being used, since Packer will clean up after itself. Instead of hardcoding the values, we can discover the appropriate resources.

In my case, I have PowerShell code to do this and pass it to Packer:

$vCenterHostname    = "$env:PHAKA_VSPHERE_HOSTNAME" 
$vCenterUserName    = "$env:PHAKA_VSPHERE_USERNAME" 
$vCenterPassword    = "$env:PHAKA_VSPHERE_PASSWORD"
$PackerUserName     = "$env:PHAKA_PACKER_USERNAME" 
$PackerPassword     = "$env:PHAKA_PACKER_PASSWORD"  
$SshPassword        = New-Password -Length 32
$SshUsername        = "root"

$Server = Connect-VIServer -Server $vCenterHostname -User $vCenterUserName -Password $vCenterPassword

$VMHost = Get-VMHost -Server $Server -Tag "Packer" 
$VMHostName = $VMHost.Name

# Doesn't work when using PowerCLI on macOS
$VMNetwork = Get-VirtualPortGroup -VMHost $VMHost -Tag "Packer"
$VMNetworkName = $VMNetwork.Name

$Datastore = Get-Datastore -Server $Server -Tag "Packer"
$DatastoreName = $Datastore.Name

$ContentLibraryName = "templates"
$FullPath = "Phaka"

# Call packer to create virtual machine
packer build -var "vcenter_folder=$FullPath" `
             -var "vsphere_network=$VMNetworkName" `
             -var "vsphere_host=$VMHostName" `
             -var "vcenter_username=$vCenterUserName" `
             -var "vcenter_password=$vCenterPassword" `
             -var "vcenter_server=$vCenterHostname" `
             -var "vcenter_datastore=$DatastoreName" `
             -var "vcenter_content_library=$ContentLibraryName" `
             -var "create_snapshot=false" `
             -var "ssh_password=$SshPassword" `
             -var "ssh_username=$SshUsername" `
             -var "disk_size=32768" `
             -var "packer_password=$PackerPassword" `
             -var "packer_username=$PackerUserName" `
             "."

The ability to use data sources within the packer template to replace the PowerShell script reduces the maintenance of the template and makes it a whole lot more portable between vCenter servers.

I also have an installation with several hosts managed by vCenter, but which isn't a cluster. In that case, I have much more complex powershell script to get all the hosts, sort them by capacity, and use the one with the most capacity, which is almost always memory. While most of the hosts are identical in configuration, there are some with SSD storage which may be preferable when it improves the time to build a template.

Consider that I have about 45 packer templates, creating a base template for every supported version of Debian, Ubuntu, Red Hat, CentOS, FreeBSD, OpenBSD, Windows and Solaris, for the 32 and 64 bit architectures where supported. That's a lot of templates to maintain.

Then I proceed to "specialize" those templates to be a test host, build agent and whatnot. In this case, I find all the templates that meets a certain criteria, for example names that start withubuntu-20.04-amd64-*, and then proceed to sort them descending according to the date they were created, and then use the latest one. Having a data source to do this, would allow the vsphere-clone builder to specialize the latest base template (created earlier).

I'm finding that I'm passing around a lot of the same information, and that my templates aren't really that portable between vCenter installations.

Potential configuration

Since there's several resources involved, I'll focus on finding the datastore:

Finding a datastore that's tagged with TagA. If there is more than one, fail.
```
data "vsphere-datastore" "basic-datastore" {
    filters = {
        tags = ["TagA"]
    }
}
```
One can have a filter for tag="TagA" which is essentially the same.
Finding a datastore that's tagged with TagA. If there is more than one, fail.
```
data "vsphere-datastore" "basic-datastore" {
    filters = {
        tags = ["TagA"]
    }
}
```
One can have a filter for tag="TagA" which is essentially the same.
Find the datastore with tag TagA, and if there are several tagged, return the one with the most capacity (available storage).
```
data "vsphere-datastore" "basic-datastore" {
     filters = {
        tags = ["TagA"]
    }
    most_capacity = true
} 
```

Find a datastore named "datastore1"

data "vsphere-datastore" "basic-datastore" {
    filters = {
        name = "datastore1"
    }
}

Find a datastore with the name that starts with "datastore", and if many exist, return the one with the most capacity. When you have multiple standalone hosts in vCenter, the datastores of each host are renamed so there's no duplicates. You may end up with datastore1, datastore1 (1) and datastore1 (2).
```
data "vsphere-datastore" "basic-datastore" {
    filters = {
        name = "datastore*"
    }
    most_capacity = true
}
```

Find the datastore with the most capacity in datacenter X

data "vsphere-datastore" "basic-datastore" {
    datacenter = "X"
    most_capacity = true
}

Find the datastore with the most capacity on host "X"

data "vsphere-datastore" "basic-datastore" {
    host = "X"
    most_capacity = true
}

Now that we have a datastore, we can complete the template as follows:

source "vsphere-iso" "main" {
   # Omitted
   host  = data.vsphere-datastore.basic-datastore.host.id
   datastore  = data.vsphere-datastore.basic-datastore.id
   # Omitted
}

Here's another example, suppose I want to find the path of the latest Ubuntu 22.04 ISO stored in a content library tagged with TagA. Suppose there's a different process responsible for uploading detecting and uploading new versions of the Ubuntu 22.04 ISOs to the content library. Today there may be a content item for ubuntu-22.04-amd64, next quarter they will upload ubuntu-22.04.1-amd64, and in six months, it will be ubuntu-22.04.2-amd64. What we really want is to create an Ubuntu 22.04 base image as quickly as we can, and as such, it's beneficial to just get the latest ISO that was uploaded. In packer it could look something like this:

data "vsphere-contentlibrary" "main" {
    filter {
        tags = ["TagA"]
    }
}

data "vsphere-contentlibrary-item" "main" {
    name = "ubuntu-22.04*-amd64"
    content_library = data.vsphere-contentlibrary.main.id
    type = "iso"
    most_recent = true
}

source "vsphere-iso" "main" {
   # Omitted
   iso_paths     = [ data.vsphere-contentlibrary-item.main.path ]
   # Omitted
}

Potential References

Terraform vSphere data sources for examples. Perhaps packer could use much of that code.
Packer Amazon EC2 AMI Data Source

hashicorp / packer-plugin-vsphere