hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.11k stars 3.33k forks source link

vmware-iso builder should crash on vmotion event #5091

Closed timurb closed 5 years ago

timurb commented 7 years ago

I'm using vmware-iso builder to produce base images for my system using esx5 remote type. All was fine until recently I started to get errors for exact same template which worked before.

The messages in packer log were Error getting SSH address: EOF in place of usual Error getting SSH address: No interface on the VM has an IP address ready (see logs in the bottom of the issue).

After some unsuccessful googling and after logs inspection I found out that the problem was vmotion event moving my VM away from the host on which I was running the build.

I would expect Packer to catch such events and crash/produce some clue about that.

Some more info below.

ESXCLI output during normal waiting for VM ip:

~ # esxcli network vm list
World ID  Name               Num Ports  Networks
--------  -----------------  ---------  --------------
37358809  XXXX-1254                  1  dvportgroup-29
37557285  XXX-315                    1  dvportgroup-29
37571048  XXXX-XXXXXX                1  dvportgroup-29
37655795  XXXXX-211                  1  dvportgroup-29
37655800  XXXXX-266                  1  dvportgroup-29
47756103  XXX-997                    1  dvportgroup-29
48321518  develop-XXXX               1  dvportgroup-29
48323541  XXXX-1280                  1  dvportgroup-29
48325368  XXXX-7                     1  dvportgroup-29
49819173  XXX-378                    1  dvportgroup-29
50405354  XXXXX-358                  1  dvportgroup-29
50414791  XXXXX-356                  1  dvportgroup-29
50578942  XXXX-350                   1  dvportgroup-29
51080563  XXX-1090                   1  dvportgroup-29
51164499  packer-trusty              1  VM Network
51228803  XXXX-467                   1  dvportgroup-29
51275630  XXXX-319-rollback          1  dvportgroup-29
51337073  XXX-893                    1  dvportgroup-29
51343903  XXX-439                    1  dvportgroup-29
51453345  XXXX-564                   1  dvportgroup-29
51546103  XXX-313                    1  dvportgroup-29

ESXCLI output after vmotion happened:

~ # esxcli network vm list
World ID  Name               Num Ports  Networks
--------  -----------------  ---------  --------------
37358809  XXXX-1254                  1  dvportgroup-29
37557285  XXX-315                    1  dvportgroup-29
37571048  XXXX-XXXXXX                1  dvportgroup-29
37655795  XXXXX-211                  1  dvportgroup-29
37655800  XXXXX-266                  1  dvportgroup-29
47756103  XXX-997                    1  dvportgroup-29
48321518  develop-XXXX               1  dvportgroup-29
48323541  XXXX-1280                  1  dvportgroup-29
48325368  XXXX-7                     1  dvportgroup-29
49819173  XXX-378                    1  dvportgroup-29
50405354  XXXXX-358                  1  dvportgroup-29
50414791  XXXXX-356                  1  dvportgroup-29
50578942  XXXX-350                   1  dvportgroup-29
51080563  XXX-1090                   1  dvportgroup-29
51228803  XXXX-467                   1  dvportgroup-29
51275630  XXXX-319-rollback          1  dvportgroup-29
51337073  XXX-893                    1  dvportgroup-29
51343903  XXX-439                    1  dvportgroup-29
51453345  XXXX-564                   1  dvportgroup-29
51546103  XXX-313                    1  dvportgroup-29

That is my host is no longer in the list.

This case is probably a good sign for producing error -- for example, as far as I understand this could happen also when you manually destroy the VM which is being processed by Packer.

And here is output logs from Packer:

2017/07/04 09:57:46 packer: 2017/07/04 09:57:46 starting remote command: esxcli --formatter csv network vm port list -w 51164499
2017/07/04 09:57:46 packer: 2017/07/04 09:57:46 [DEBUG] Error getting SSH address: No interface on the VM has an IP address ready
2017/07/04 09:57:51 packer: 2017/07/04 09:57:51 opening new ssh session
2017/07/04 09:57:51 packer: 2017/07/04 09:57:51 starting remote command: esxcli --formatter csv network vm list
2017/07/04 09:57:52 packer: 2017/07/04 09:57:52 opening new ssh session
2017/07/04 09:57:52 packer: 2017/07/04 09:57:52 starting remote command: esxcli --formatter csv network vm port list -w 51164499
2017/07/04 09:57:53 packer: 2017/07/04 09:57:53 [DEBUG] Error getting SSH address: No interface on the VM has an IP address ready
2017/07/04 09:57:58 packer: 2017/07/04 09:57:58 opening new ssh session
2017/07/04 09:57:58 packer: 2017/07/04 09:57:58 starting remote command: esxcli --formatter csv network vm list
2017/07/04 09:57:58 packer: 2017/07/04 09:57:58 [DEBUG] Error getting SSH address: EOF
2017/07/04 09:58:03 packer: 2017/07/04 09:58:03 opening new ssh session
2017/07/04 09:58:03 packer: 2017/07/04 09:58:03 starting remote command: esxcli --formatter csv network vm list
2017/07/04 09:58:04 packer: 2017/07/04 09:58:04 [DEBUG] Error getting SSH address: EOF

I'm using Packer 0.12.3 running on Ubuntu 12.04. Relevant builders section of a template:

[
  {
    "remote_type": "esx5",
    "remote_host": "{{user `remote_host`}}",
    "remote_port": "22",
    "remote_datastore": "{{user `remote_datastore`}}",
    "remote_username": "{{user `remote_username`}}",
    "remote_password": "{{user `remote_password`}}",
    "name": "{{user `name`}}",
    "type": "vmware-iso",
    "guest_os_type": "ubuntu-64",
    "format": "ova",
    "disk_size": "{{user `disk_size`}}",
    "keep_registered": "false",
    "iso_urls": [
      "{{user `iso_url`}}"
    ],
    "iso_checksum": "{{user `iso_checksum`}}",
    "iso_checksum_type": "md5",
    "ssh_username": "packer",
    "ssh_password": "packer",
    "http_directory": "http/",
    "http_port_min": 5800,
    "http_port_max": 5810,
    "headless": true,
    "vnc_port_min": 5900,
    "vnc_port_max": 5910,
    "boot_wait": "15s",
    "boot_command": [
      "<esc><esc><enter><wait>",
      "/install/vmlinuz ",
      "preseed/url=http://{{user `real_ip`}}:{{.HTTPPort}}/ubuntu/preseed.cfg ",
      "debian-installer=en_US auto locale=en_US kbd-chooser/method=us ",
      "fb=false debconf/frontend=noninteractive ",
      "hostname=packer-base ",
      "keyboard-configuration/modelcode=SKIP keyboard-configuration/layout=USA ",
      "keyboard-configuration/variant=USA console-setup/ask_detect=false ",
      "initrd=/install/initrd.gz -- <enter>"
    ],
    "shutdown_command": "echo 'packer' | sudo -S -E shutdown -P now",
    "ssh_wait_timeout": "45m",
    "tools_upload_flavor": "linux",
    "vmdk_name": "disk",
    "disk_type_id": "thin",
    "vmx_data": {
      "ethernet0.virtualDev": "{{user `eth_dev`}}",
      "ethernet0.networkName": "{{user `eth_network`}}",
      "ethernet0.present": "true",
      "ethernet0.startConnected": "true",
      "ethernet0.connectionType": "nat",
      "MemTrimRate": "0",
      "sched.mem.pshare.enable": "FALSE",
      "mainMem.useNamedFile": "FALSE",
      "prefvmx.minVmMemPct": "100"
    }
  }
]
rickard-von-essen commented 7 years ago

Is there a way to check if vmotion is enabled with the exicli? IMHO we should refuse to build on an ESXi with it enabled.

timurb commented 7 years ago

No idea, I'm just starting to learn the VMWare and its internal stuff. Probably not as it's configured for the whole vSphere cluster, not specific node. But if you decide to drop the build instantly please make that configurable - for us there's no option at the moment of running dedicated ESX host just for building the images. We are willing to accept the risk of failing the build (and it can be lowered or even eliminated by reducing DRS aggressiveness) just make it more explicit.

rickard-von-essen commented 7 years ago

You can always run a nested ESXi on vSphere.

Does vicfg-advcfg --list list anything with motion?

What is the output of vim-cmd hostsvc/vmotion/netconfig_get? It should probably tell if vMotion is enabled. (See Configuring a VMkernel port and enable vMotion via command line

timurb commented 7 years ago

I don't understand the output of the second command -- just see that there are some references to VMotion there. And the first command didn't work for me. Here is the output.

~ # vmware -vl
VMware ESXi 5.5.0 build-4345813
VMware ESXi 5.5.0 Update 3
~ # vicfg-advcfg --list
-sh: vicfg-advcfg: not found
~ # vim-cmd hostsvc/vmotion/netconfig_get
(vim.host.VMotionSystem.NetConfig) {
   dynamicType = <unset>,
   candidateVnic = (vim.host.VirtualNic) [
      (vim.host.VirtualNic) {
         dynamicType = <unset>,
         device = "vmk0",
         key = "VMotionConfig.vmotion.key-vim.host.VirtualNic-vmk0",
         portgroup = "Management Network",
         spec = (vim.host.VirtualNic.Specification) {
            dynamicType = <unset>,
            ip = (vim.host.IpConfig) {
               dynamicType = <unset>,
               dhcp = false,
               ipAddress = "10.XX.XX.XX",
               subnetMask = "255.255.252.0",
               ipV6Config = (vim.host.IpConfig.IpV6AddressConfiguration) null,
            },
            mac = "XX:XX:XX:XX:XX:XX",
            distributedVirtualPort = (vim.dvs.PortConnection) null,
            portgroup = "Management Network",
            mtu = 1500,
            tsoEnabled = true,
            netStackInstanceKey = "defaultTcpipStack",
         },
      },
      (vim.host.VirtualNic) {
         dynamicType = <unset>,
         device = "vmk1",
         key = "VMotionConfig.vmotion.key-vim.host.VirtualNic-vmk1",
         portgroup = "",
         spec = (vim.host.VirtualNic.Specification) {
            dynamicType = <unset>,
            ip = (vim.host.IpConfig) {
               dynamicType = <unset>,
               dhcp = false,
               ipAddress = "10.XX.XX.YY",
               subnetMask = "255.255.255.0",
               ipV6Config = (vim.host.IpConfig.IpV6AddressConfiguration) null,
            },
            mac = "YY:YY:YY:YY:YY:YY",
            distributedVirtualPort = (vim.dvs.PortConnection) {
               dynamicType = <unset>,
               switchUuid = "XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX",
               portgroupKey = "dvportgroup-27",
               portKey = "130",
               connectionCookie = XXXXXXXXXXXX,
            },
            portgroup = <unset>,
            mtu = 9000,
            tsoEnabled = true,
            netStackInstanceKey = "defaultTcpipStack",
         },
      }
   ],
   selectedVnic = <vim.host.VirtualNic:VMotionConfig.vmotion.key-vim.host.VirtualNic-vmk1>,
}
rickard-von-essen commented 7 years ago

Don't really remember but maybe you have to first run esxcli to enter the cli and then vicfg-advcfg --list.

timurb commented 7 years ago

I think I posted a comment here yesterday but it seems to have disappeared. vicfg-advcfg is a command from ESX3.5/4.0 and I'm running ESX5.5 and there is different command for that. I'll post its output next week.

rickard-von-essen commented 7 years ago

Also see https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1038578

timurb commented 7 years ago

Here is output of the command relevant to 5.5:

~ # esxcli system settings advanced list | egrep -i -A9 '^\s*Path:.*motion'
   Path: /Migrate/VMotionStreamHelpers
   Type: integer
   Int Value: 2
   Default Int Value: 2
   Min Value: 1
   Max Value: 32
   String Value:
   Default String Value:
   Valid Characters:
   Description: Number of helpers to allocate for VMotion streams
--
   Path: /Migrate/VMotionStreamDisable
   Type: integer
   Int Value: 0
   Default Int Value: 0
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Pretend to not support streams
--
   Path: /Migrate/VMotionLatencySensitivity
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Make vMotion helper worlds latency sensitive, avoid transmit delays.
--
   Path: /Migrate/VMotionResolveSwapType
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Attempt to resolve swap type during VMotion initialization
--
   Path: /SvMotion/SvMotionAvgDisksPerVM
   Type: integer
   Int Value: 8
   Default Int Value: 8
   Min Value: 4
   Max Value: 1024
   String Value:
   Default String Value:
   Valid Characters:
   Description: Initial Storage vMotion Heap Size is proportional to this setting
--
   Path: /XvMotion/VMFSOptimizations
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Enable VMFS-specific IO optimizations
e-mow commented 5 years ago

My work around for this was to provision a datastore that was only on the ESXi host that I wanted to run packer against. This way DRS would never move the VM while it was building.

We want fully automated DRS to be enabled for normal cluster operations however for packer we only want to talk to one ESXi node.

SwampDragons commented 5 years ago

I have no idea how we'd go about checking for this from within Packer, and it sounds like there's a reasonable architectural workaround. I think the best we're going to be able to do here is add a warning in the docs.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.