hashicorp / terraform-provider-vsphere

Terraform Provider for VMware vSphere
https://registry.terraform.io/providers/hashicorp/vsphere/
Mozilla Public License 2.0
619 stars 452 forks source link

SCSI IDs changing on machines built with 2.60+. #2089

Closed gavinwill closed 9 months ago

gavinwill commented 10 months ago

Community Guidelines

Terraform

Terraform v1.3.0

Terraform Provider

v2.6.0

VMware vSphere

7.0.3.01700

Description

Hi

On building a VM from an Ubuntu OVF template we are seeing the scsi order change and therefore the interface naming change (which has impact as we use cloudinit and specify nic to configure)

On a machine deployed with provider 2.5.1 we see the correct ordering for us


03:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
04:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
13:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
1b:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)

This provides us with ens192 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 on the ubuntu machine

When we deploy a brand new machine with provider 2.6.0+ we see the scsi order change

### lspci from vm w with provider 2.6.0 = bad

03:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
04:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
0b:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
0c:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
13:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
1b:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)

This change in order means that our nic interface name has changed since naming is

example Interface names are generated as:

# en --> ethernet
# p0 --> bus number
# s31 --> slot number

ip link on this machine shows 2: ens161: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

Note - i need to get access via console in vmware since cant ssh to host as network config mismatch.

If i have a machine built on provider 2.5.1 then upgrade the provider to 2.6.0+ and do a plan we see that infra is up to date and no changes are planned. It seems it is on new vm creation with later provider the order is incorrect.


Upgrading modules...

Initializing provider plugins...
- Finding hashicorp/vsphere versions matching "2.6.0"...
- Installing hashicorp/vsphere v2.6.0...
- Installed hashicorp/vsphere v2.6.0 (signed by HashiCorp)

Plan 
No changes. Your infrastructure matches the configuration.```

### Affected Resources or Data Sources

vsphere_network.network

### Terraform Configuration

Will provide details in update

### Debug Output

Will provide details in update

### Panic Output

_No response_

### Expected Behavior

We would expect no change in the scsi ordering when using new provider

### Actual Behavior

Scsi ordering incorrect causing issues with disk and nics

### Steps to Reproduce

upgrade provider to 2.6.0+ (from verified good 2.5.1) and create new machine

### Environment Details

_No response_

### Screenshots

_No response_

### References

_No response_
github-actions[bot] commented 10 months ago

Hello, gavinwill! 🖐

Thank you for submitting an issue for this provider. The issue will now enter into the issue lifecycle.

If you want to contribute to this project, please review the contributing guidelines and information on submitting pull requests.

tenthirtyam commented 10 months ago

@vasilsatanasov cannyou investigate to see if this is related to SR-IOV introduction?

gavinwill commented 10 months ago

I did think it may be SR-IOV related from quickly looking at the diff from 2.5.1 > 2.6.0

Potentially may be available for PR to fix also

vasilsatanasov commented 10 months ago

Looking at it, @gavinwill , could you please provide and example HCL to reproduce the issue + Ubuntu version you are using ?

gavinwill commented 10 months ago

Hi @vasilsatanasov Apologies - Its an Ubuntu 2004 OVF template

I have just tested this out by building the provider against different commits go build -o terraform-provider-vsphere and using dev_overrides on the provider installation and can confirm that the last commit this works for me is https://github.com/hashicorp/terraform-provider-vsphere/commit/6211c3bd2fbb564fa500d7ac5a2cbae8a828658c

If i taint machine and rebuild with building the provider against https://github.com/hashicorp/terraform-provider-vsphere/commit/9c255306332427f46e2a32e844e0e1b2a53de3f6 It fails and we see the scsi device order wrong and hence the ens161 nic.

We use a slightly customised module. I am just parsing that down to minimal stand alone code so that you can repo.

vasilsatanasov commented 10 months ago

Thank you @gavinwill , waiting for the code for reproduction!

gavinwill commented 10 months ago

Hi

I have "converted" our module to a simple tf file with hard coded values but can repo the issue with the below config.

If i specify the provider to be 2.5.1 and apply (after cleaning out .terraform folder to be sure including terraform init) the machine boots up fine with expected scsi order and nic is ens192

If I clean out the .terraform folder and update provider to 2.6.0+ the machine boots up but the nic is ens161 and the scsi ordering is changed. The change to terraform is only the provider version.

terraform {
  required_providers {
    vsphere = {
      source  = "hashicorp/vsphere"
      version = "2.6.1"
    }
  }
  required_version = ">= 1.3.0"
}

provider "vsphere" {
  vsphere_server       = "vsphereserver.com"
  user                 = "deploy@vsphereserver.com"
  password             = "hunter2"

}

resource "vsphere_virtual_machine" "vm" {
  name                    = "gt-gavintest-01"
  resource_pool_id        = "resgroup-1234"
  folder                  = "test"
  extra_config = {
              "guestinfo.metadata": "our metadata for cloud init including netplan for ens192"
              "guestinfo.userdata": "our base64 userdata",
              "guestinfo.userdata.encoding": "base64"
            }

  extra_config_reboot_required  = false
  firmware                      = "bios"
  efi_secure_boot_enabled       = false
  enable_disk_uuid              = false
  datastore_id                  =  "datastore-1234"

  num_cpus               = 4
  num_cores_per_socket   = 2
  cpu_hot_add_enabled    = true
  cpu_hot_remove_enabled = true
  memory                 = 8192
  guest_id               = "ubuntu64Guest"
  scsi_bus_sharing       = "noSharing"
  scsi_type              = "pvscsi"
  scsi_controller_count  = 4
  wait_for_guest_net_routable = false
  wait_for_guest_ip_timeout   = 0 
  wait_for_guest_net_timeout  = 5

  dynamic "network_interface" {
    for_each = local.networks
    content {
      network_id   = "dvportgroup-1234"
      adapter_type = "vmxnet3"
      ovf_mapping  = "nic${network_interface.key}"
    }
  }

  disk {
      label             = "disk0"
      size              = 72
      unit_number       = 0
      thin_provisioned  = true
      eagerly_scrub     = false
      datastore_id      = "datastore-1234"
      io_reservation    = 0
      io_share_level    = "normal"
      io_share_count    = 1000
    }

  clone {
    template_uuid = "12345-78910-121213-1415-16171819e"
    linked_clone  = false
    timeout       = 30
  }

  hv_mode                          = "hvAuto"
  ept_rvi_mode                     = "automatic"
  nested_hv_enabled                = false
  enable_logging                   = false
  cpu_performance_counters_enabled = false
  swap_placement_policy            = "inherit"
  latency_sensitivity              = "normal"
  shutdown_wait_timeout = 3
  force_power_off       = false
}

our locals contains

locals {    
    networks = [
      { "addresses" : ["10.12.13.14/24"], },
      { "addresses" : [], },
    ]
}

We use the address to populate our cloudinit and do a for each on the key in above tf.

Hope this helps

adamhorden commented 10 months ago

I have faced this same issue today, I could not work out why the order was incorrect on new VM builds, before finding this issue. VMs would come up, but the network would not come up so needed manual intervention via the console.

v2.5.1:

03:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)

v2.6.1:

04:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)

This causes the network to not come up as ens160 now becomes ens224. Terraform Plans on v2.6.1 are clean but any new VMs on v2.6.1 have the incorrect order. For the moment pinning to v2.5.1 works as expected.

Adam Horden

tenthirtyam commented 10 months ago

@vasilsatanasov - this might be related to the SR-IOV enhancement?

vasilsatanasov commented 10 months ago

@vasilsatanasov - this might be related to the SR-IOV enhancement?

Looks like it is, as per @gavinwill 's report.

github-actions[bot] commented 8 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.