hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.06k stars 3.32k forks source link

Unable to establish a SSH connection following reboot due to change of IP #8528

Closed DanHam closed 4 years ago

DanHam commented 4 years ago

Overview of the Issue

Packer is unable to establish a SSH connection after the installer has completed and the instance has been rebooted. This occurs because the IP address assigned to the instance by the DHCP server changes following the reboot. Packer determines the IP address it should connect to at the start of the build and never picks up on the fact the instance has been assigned a new address.

The issue only seems to manifest itself when building Debian 10 and using the VMware ISO builder - perhaps because of the newly introduced 'default-uid' used by the DHCP client (see below)? However, I'm fairly confident that DHCPD servers do not have to provide an instance with the same IP address as leased previously. The current logic built in to Packer does not allow for this possibility.

I am able to connect to the instance manually. The same template works fine with Virtualbox 6.0.14. Note that Debian 9 works with both VMware and Virtualbox.

Reproduction Steps

Run the template referenced below.

The Debian ISO used in the build can be downloaded HERE

Packer version

Packer v1.5.1-dev (66445ecd2)

Simplified Packer Buildfile

See the debian-10.json template HERE

Operating system and Environment details

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.14.6
BuildVersion:   18G2022

VMware Fusion Professional Version 8.5.10 (7527438)

Log Fragments and crash.log files

While the installer is running the contents of the VMware DHCPD leases file (/var/db/vmware/vmnet-dhcpd-vmnet8.leases) is as follows:

# All times in this file are in UTC (GMT), not your local timezone.   This is
# not a bug, so please don't ask about it.   There is no portable way to
# store leases in the local timezone, so please don't request this as a
# feature.   If this is inconvenient or confusing to you, we sincerely
# apologize.   Seriously, though - don't ask.
# The format of this file is documented in the dhcpd.leases(5) manual page.

lease 172.16.83.128 {
        starts 5 2019/12/20 18:18:22;
        ends 5 2019/12/20 18:48:22;
        hardware ethernet 00:0c:29:b6:82:7a;
        uid 01:00:0c:29:b6:82:7a;
}

Once the installer has completed and the instance has rebooted the DHCPD leases file has a new entry with the same MAC address. Note the addition of the uid field in the second entry. Clearly, the VMware DHCPD implementation has decided that this is a 'new' machine that needs a new address:

# All times in this file are in UTC (GMT), not your local timezone.   This is
# not a bug, so please don't ask about it.   There is no portable way to
# store leases in the local timezone, so please don't request this as a
# feature.   If this is inconvenient or confusing to you, we sincerely
# apologize.   Seriously, though - don't ask.
# The format of this file is documented in the dhcpd.leases(5) manual page.

lease 172.16.83.128 {
        starts 5 2019/12/20 18:18:22;
        ends 5 2019/12/20 18:48:22;
        hardware ethernet 00:0c:29:b6:82:7a;
        uid 01:00:0c:29:b6:82:7a;
}
lease 172.16.83.129 {
        starts 5 2019/12/20 18:22:19;
        ends 5 2019/12/20 18:52:19;
        hardware ethernet 00:0c:29:b6:82:7a;
        uid ff:29:b6:82:7a:00:01:00:01:25:8f:cd:da:00:0c:29:b6:82:7a;
        client-hostname "localhost";
}

Meanwhile, Packer continues attempting to connect to the address parsed from the leases file when the initial IP address was requested by the installer (172.16.83.128). The following snippet from the logs shows the change in message before and after the reboot of the instance.

2019/12/20 18:22:04 packer-builder-vmware-iso plugin: [DEBUG] TCP connection to SSH ip/port failed: dial tcp 172.16.83.128:22: connect: connection refused
2019/12/20 18:22:24 packer-builder-vmware-iso plugin: [DEBUG] TCP connection to SSH ip/port failed: dial tcp 172.16.83.128:22: i/o timeout

The first message is displayed when the installer is running - the correct address has been recorded but the SSH daemon is not running so we get a connection refused error.

The second message is displayed post reboot when the instance has obtained a different IP address - this time 172.16.83.129. However, Packer continues to attempt to connect on the first address as it assumes the address assigned to an instance will remain the same across reboots - in other words it only parses the leases file once at the beginning of the run. Clearly, Packer will never be able to connect as the IP address is now different.

If I manually SSH into the instance I am able to view the instances dhclient leases file:

$ cat /var/lib/dhcp/dhclient.eth0.leases
default-duid "\000\001\000\001%\217\315\332\000\014)\266\202z";
lease {
  interface "eth0";
  fixed-address 172.16.83.129;
  option subnet-mask 255.255.255.0;
  option routers 172.16.83.2;
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option domain-name-servers 172.16.83.2;
  option dhcp-server-identifier 172.16.83.254;
  option broadcast-address 172.16.83.255;
  option netbios-name-servers 172.16.83.2;
  option domain-name "localdomain";
  renew 5 2019/12/20 18:35:49;
  rebind 5 2019/12/20 18:48:35;
  expire 5 2019/12/20 18:52:20;
}
DanHam commented 4 years ago

5642 looks to be reporting the same problem. However, in that report the issue occurs building CoreOS with Hyper-V. As stated above, I do not think this issue will be specific to any one builder or OS.

The core issue appears to be that Packer assumes an instance will always receive the same IP address/lease across reboots.

Currently, Packer first enters a loop to determine the IP address it should try and connect to; Clearly, the logic used is dependant on the builder. Once Packer has determined the IP it enters into another loop that attempts to establish a connection. If the IP address changes while Packer is in this second loop (as is possible when the machine reboots) then the connection attempt will eventually time out and fail.

To fix this the two loops need to be merged e.g. Packer should continually try to determine the IP address that has been assigned to the instance and then attempt to connect in the same loop.

The way the code is structured at present makes this difficult to fix as the connection logic/loop has been broken out in to a generic helper that all builders use.

akutz commented 4 years ago

FWIW, I'm experiencing the same issue with packer 1.5.1 and Photon OS 3 Rev 2. It seems the VM grabs a lease in order to boot with kickstart, but that IP changes by the time the guest is rebooted and ready for Packer to take over. Except Packer is waiting for the IP that it must have seen first -- the one used to access the kickstart config over HTTP.

akutz commented 4 years ago

I just got Photon OS 3 working by adding the open-vm-tools package to the list of packages in the kickstart file:

{
  "hostname": "haproxy-lb",
  "password": {
    "crypted": false,
    "text": "photon"
  },
  "disk": "/dev/sda",
  "install_linux_esx": true,
  "packages": [
    "minimal",
    "linux",
    "initramfs"
  ],
  "additional_packages": [
    "ca-certificates",
    "curl",
    "gzip",
    "haproxy",
    "jq",
    "lsof",
    "lvm2",
    "ntp",
    "openssh-server",
    "open-vm-tools",
    "sed",
    "shadow",
    "sudo",
    "tar",
    "vim"
  ],
  "postinstall": [
    "#!/bin/sh",
    "useradd -U --groups wheel photon && echo 'photon:photon' | chpasswd",
    "useradd --system --home-dir=/var/lib/haproxy --user-group haproxy",
    "mkdir -p /home/photon",
    "chown -R photon:photon /home/photon",
    "mkdir -p /var/lib/haproxy",
    "chown -R haproxy:haproxy /var/lib/haproxy",
    "systemctl enable sshd",
    "systemctl disable haproxy",
    "echo 'photon ALL=(ALL) NOPASSWD: ALL' >/etc/sudoers.d/photon",
    "chmod 440 /etc/sudoers.d/photon",
    "tdnf clean all"
  ]
}
DanHam commented 4 years ago

EDIT: Actually, I'm still encountering this issue - even after adding open-vm-tools to the preseed.

@akutz Thanks for that! Adding open-vm-tools to the Debian preseed file (Debian's kickstart equivalent) worked for me too.

d-i pkgsel/include string open-vm-tools [...space separated list of additional packages to install into the target system]

I haven't looked too deeply, but clearly, this enables some interaction/magic to occur that ensures the instance keeps the same IP address across reboots - perhaps something in /etc/vmware-tools/scripts/vmware/network??

While this a viable workaround for the given OS/platform combinations documented here, I expect this is a bug that will continue to resurface for Packer users due to the way the IP discovery and connection logic is currently structured.

Packer needs to be able to handle the situation where the IP address changes across a reboot/dhcp address renewal. Workarounds of the kind documented here may not always be available.

DanHam commented 4 years ago

@akutz Unfortunately, I've now found that only worked for me once! Has adding open-vm-tools to your kickstart solved the problem for you consistently?

akutz commented 4 years ago

@akutz Unfortunately, I've now found that only worked for me once! Has adding open-vm-tools to your kickstart solved the problem for you consistently?

Hi @DanHam,

I've built the image now several times sans any issues.

SwampDragons commented 4 years ago

This will probably not be an easy change to make based on Packer's architecture and the way our communicators currently work, but I agree that ideally Packer would support situations where the IP address changes.

llxp commented 4 years ago

I wonder, why this is an issue with the architecture as on the vmware-iso builder, when using an esxi, it works as well. And there I am using dnsmasq as my dhcp. So there is no possibility to look in a file. Instead packer has to look using the open-vm-tools or the esxi api (I guess). So, why not constantly looking for a change in the leases file and just taking the last entry as a valid ip.

I tried to build debian 10 with open-vm-tools in the preseed, but it still doesn't work. I tried using the version 1.5.1 (official build) and vmware workstation 14.1.7 build-12989993 and tried as well with version 15.5.1 build-15018445

llxp commented 4 years ago

I found a temporary workarount or rather a hack. I created a new interface with a subnetmask of 255.255.255.248 and chose a dhcp range from 2 ips. now the dhcp is only able to give 2 ips to the vm. The first ip will be given during the installation. The second ip will be given after the first reboot.

The config for the networking file in /etc/vmware/networking is as follows:

VERSION=1,0
answer VNET_10_DHCP yes
answer VNET_10_DHCP_CFG_HASH 8D292DB10AA5381B846E260EADE516BB459E6D65
answer VNET_10_HOSTONLY_NETMASK 255.255.255.248
answer VNET_10_HOSTONLY_SUBNET 172.16.230.0
answer VNET_10_NAT yes
answer VNET_10_NAT_PARAM_UDP_TIMEOUT 30
answer VNET_10_VIRTUAL_ADAPTER yes
answer VNET_1_DHCP yes
answer VNET_1_DHCP_CFG_HASH B70C98E2E155E3E7349FFCA26CE5694851E233FB
answer VNET_1_HOSTONLY_NETMASK 255.255.255.0
answer VNET_1_HOSTONLY_SUBNET 172.16.65.0
answer VNET_1_VIRTUAL_ADAPTER yes
answer VNET_8_DHCP yes
answer VNET_8_DHCP_CFG_HASH 39B4FEBF27D7259C57192A984AA39AD7DDA1FAC4
answer VNET_8_HOSTONLY_NETMASK 255.255.255.0
answer VNET_8_HOSTONLY_SUBNET 172.16.229.0
answer VNET_8_NAT yes
answer VNET_8_VIRTUAL_ADAPTER yes
answer VNL_DEFAULT_BRIDGE_VNET -1
add_bridge_mapping ens192 -1
add_bridge_mapping br1 0

The config file in /et/vmware/vmnet10/dhcpd/dhcpd.conf is as follows:

# Configuration file for ISC 2.0 vmnet-dhcpd operating on vmnet10.
#
# This file was automatically generated by the VMware configuration program.
# See Instructions below if you want to modify it.
#
# We set domain-name-servers to make some DHCP clients happy
# (dhclient as configured in SuSE, TurboLinux, etc.).
# We also supply a domain name to make pump (Red Hat 6.x) happy.
#

###### VMNET DHCP Configuration. Start of "DO NOT MODIFY SECTION" #####
# Modification Instructions: This section of the configuration file contains
# information generated by the configuration program. Do not modify this
# section.
# You are free to modify everything else. Also, this section must start
# on a new line
# This file will get backed up with a different name in the same directory
# if this section is edited and you try to configure DHCP again.

# Written at: 01/10/2020 11:01:36
allow unknown-clients;
default-lease-time 1800;                # default is 30 minutes
max-lease-time 7200;                    # default is 2 hours

subnet 172.16.230.0 netmask 255.255.255.248 {
        range 172.16.230.4 172.16.230.6;
        option broadcast-address 172.16.230.7;
        option domain-name-servers 172.16.230.2;
        option domain-name localdomain;
        default-lease-time 1800;                # default is 30 minutes
        max-lease-time 7200;                    # default is 2 hours
        option netbios-name-servers 172.16.230.2;
        option routers 172.16.230.2;
}
host vmnet10 {
        hardware ethernet 00:50:56:C0:00:0A;
        fixed-address 172.16.230.1;
        option domain-name-servers 0.0.0.0;
        option domain-name "";
        option routers 0.0.0.0;
}
####### VMNET DHCP Configuration. End of "DO NOT MODIFY SECTION" #######

The config file in /etc/vmware/vmnet10/nat/nat.conf is as follows:

# VMware NAT configuration file
# Manual editing of this file is not recommended. Using UI is preferred.

[host]

# NAT gateway address
ip = 172.16.230.2
netmask = 255.255.255.248

# VMnet device if not specified on command line
device = /dev/vmnet10

# Allow PORT/EPRT FTP commands (they need incoming TCP stream ...)
activeFTP = 1

# Allows the source to have any OUI.  Turn this on if you change the OUI
# in the MAC address of your virtual machines.
allowAnyOUI = 1

# Controls if (TCP) connections should be reset when the adapter they are
# bound to goes down
resetConnectionOnLinkDown = 1

# Controls if (TCP) connection should be reset when guest packet's destination
# is NAT's IP address
resetConnectionOnDestLocalHost = 1

# Controls if enable nat ipv6
natIp6Enable = 0

# Controls if enable nat ipv6
natIp6Prefix = fd15:4ba5:5a2b:100a::/64

[tcp]

# Value of timeout in TCP TIME_WAIT state, in seconds
timeWaitTimeout = 30

[udp]

# Timeout in seconds. Dynamically-created UDP mappings will purged if
# idle for this duration of time 0 = no timeout, default = 60; real
# value might be up to 100% longer
timeout = 30

[netbios]
# Timeout for NBNS queries.
nbnsTimeout = 2

# Number of retries for each NBNS query.
nbnsRetries = 3

# Timeout for NBDS queries.
nbdsTimeout = 3

[incomingtcp]

# Use these with care - anyone can enter into your VM through these...
# The format and example are as follows:
#<external port number> = <VM's IP address>:<VM's port number>
#8080 = 172.16.3.128:80

[incomingudp]

# UDP port forwarding example
#6000 = 172.16.3.0:6001
akutz commented 4 years ago

I wonder, why this is an issue with the architecture as on the vmware-iso builder, when using an esxi, it works as well. And there I am using dnsmasq as my dhcp. So there is no possibility to look in a file. Instead packer has to look using the open-vm-tools or the esxi api (I guess). So, why not constantly looking for a change in the leases file and just taking the last entry as a valid ip.

I tried to build debian 10 with open-vm-tools in the preseed, but it still doesn't work. I tried using the version 1.5.1 (official build) and vmware workstation 14.1.7 build-12989993 and tried as well with version 15.5.1 build-15018445

I also found that on Photon it was failing after a while after the first attempt after a reboot. Finally I got it working by killing this in between attempts:

$ sudo ps alx | grep vagrant
    0  2316     1   0  20  0 558440636    384 -      Ss     ??    0:00.11 /opt/vagrant-vmware-desktop/bin/vagrant-vmware-utility api -port=9922

Keep in mind, I'm not running Vagrant. But I bet Packer is utilizing something from Vagrant.

akutz commented 4 years ago

Based on this changelog, https://github.com/hashicorp/vagrant-plugin-changelog/blob/master/vagrant-vmware-utility-changelog.md, it does appear the vagrant-vmware-utility is used for DHCP in some capacity.

akutz commented 4 years ago

Perhaps related to https://github.com/hashicorp/vagrant/issues/9915?

akutz commented 4 years ago

Hi @DanHam,

I just noticed my vagrant-vmware-utility is 1.0.5 and 1.0.7 (download) is the most recent version. I'm going to upgrade this and see if it helps.

llxp commented 4 years ago

It seems, using the WinRM Communicator, packer is constantly querying for a new ip. I now have found a "proper" workaround better that the previous one, I mentioned earlier. (the other hacky workaround is not realiably working) I am now faking a dhcp server on a bridged network interface by creating a dhcpd.conf and dhcpd.leases file in the/etc/vmware/<vmnet>/dhcpd/ directory. I am filling the leases file by parsing the packer output asking my dhcp on the network (dnsmasq dhcp, using a custom rest api + curl) which ip belongs to the hardware address. Additional to that, I configured the dhcp to only assign one ip per hardware address and to ignore the client id. Additionally I implemented a script on the dhcp, which is beeing run everytime a new lease is beeing created. The script is then creating an entry in /etc/ethers file to create a static assignment.

VMWare workstation 15 is using a ISC DHCP version 2, because of that, the option to ignore the client id is not implemented. It is a default option from many standard dhcp servers out there to prefer the client id over the hardware id, when there is one. Even the very new kea dhcp from isc is using that as a default option. That's why I came up with this "workaround". It seems, the option to ignore the client identifier in combination with the script is a proper workaround.

I created a gist for people having the same problem: https://gist.github.com/llxp/006ad6c7aa5d81e7283631e76fd1ed71

Akvinikym commented 4 years ago

Hello.

We had the same issue and personally for us such an approach worked: build a virtual machine for VirtualBox, as it later can be imported to VMware with ease. I did not find the restrictions of this approach for now, but it works with Debian 10 at least.

jan-z commented 4 years ago

For me adding open-vm-tools to the preseed file did not work.

I found a workaround to add to the preseed file:

d-i preseed/late_command string \ sed -i 's/^#*\(send dhcp-client-identifier\).*$/\1 = hardware;/' /target/etc/dhcp/dhclient.conf

That sets the dhcp-client-identifier option so that the MAC address is used. This was the default on versions prior to Debian 10 (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906894).

DanHam commented 4 years ago

@jan-z Nice workaround! This works for me too.

Can I ask where you found the documentation for setting the client identifier to = hardware? Maybe I'm not looking hard enough but I can't seem to find that anywhere in the dhclient docs.

The calls made by ifupdown to dhclient are hardcoded into the ifup binary. The fix for Debian bug 906894 added the '-i' flag. As you know this flag results in the DUID being sent to the DHCP server and this has been the root cause of the issues we have been seeing.

The next version of ifupdown should allow the user to configure whether the calls to dhclient include the '-i' flag/send the DUID. See Debian bug 923640 and the fix HERE

jan-z commented 4 years ago

@DanHam I can't find any documentation for that setting. I got it from here: https://www.reddit.com/r/debian/comments/ca5vjb/dhcp_identifiers_changed_on_upgrade_to_buster/

DanHam commented 4 years ago

@jan-z Ah OK. I had seen that...

I've taken a more in-depth look. It's a bit difficult to find but this is documented in the SETTING OPTION VALUES USING EXPRESSIONS section of the dhcp-options(5) man page. The syntax for the expressions is documented in the dhcp-eval(5) man page.

@SwampDragons @azr The workaround put forward by @jan-z is the right way to fix this - at least until THIS FIX works it's way in to the next Debian ifupdown package.

Do you want to close this or leave it open for a while for others coming across the same issue?

azr commented 4 years ago

🤔 hm, IMHO, a doc page would be nice, like a short title/description that matches the issue so it's easy to Google, stating all possible options. So users can understand the issue and pick the path they want: update/fix/else.

Edit: Super good findings !

melck commented 4 years ago

I'm experiencing the same issue with Workstation 15.5 (ubuntu), packer 1.5.4 and Photon OS 3 Rev 2.

The workaround with open-vm-tools package is'nt working. Do you know how i can work around it without change of vmware configurations ?

matteofilippetto commented 4 years ago

@jan-z workaround works for me too with

packer version 1.5.5 vmware fusion 11.5.3 installing debian-10.3.0-amd64-netinst

sulaweyo commented 4 years ago

Same for me on Archlinux builds. Worked like a charm before but not anymore in 1.5. To be more precise the KVM build still works, just vmware does not. After the reboot the vm has a different IP and packer can't connect

kclinden commented 4 years ago

Same issue here with Packer 1.5.6, Photon OS 3 Rev2, and Fusion 11.5.3.

kclinden commented 4 years ago

For Photon OS I tried adding a line in my kickstart postinstall section to change how the dhcpclient was sending the uid info. The result is now showing that two client hostnames are sent to the DHCP server.

kickstart.json

{
    "hostname": "photon",
    "password":
        {
            "crypted": false,
            "text": "VMware123!"
        },
    "disk": "/dev/sda",
    "packagelist_file": "packages_appliance.json",
    "additional_packages": [
        "open-vm-tools"
    ],
    "postinstall": [
                    "#!/bin/sh",
                    "sed -i 's/PermitRootLogin no/PermitRootLogin yes/g' /etc/ssh/sshd_config",
                    "echo 'ClientIdentifier=mac' >> /etc/systemd/network/99-dhcp-en.network",
                    "systemctl restart sshd.service",
                    "tdnf clean all"
                   ]
}

/var/db/vmware/vmnet-dhcpd-vmnet8.leases

# All times in this file are in UTC (GMT), not your local timezone.   This is
# not a bug, so please don't ask about it.   There is no portable way to
# store leases in the local timezone, so please don't request this as a
# feature.   If this is inconvenient or confusing to you, we sincerely
# apologize.   Seriously, though - don't ask.
# The format of this file is documented in the dhcpd.leases(5) manual page.

lease 172.16.54.128 {
        starts 3 2020/05/06 15:11:39;
        ends 3 2020/05/06 15:41:39;
        hardware ethernet 00:0c:29:08:05:f4;
        uid ff:b6:22:0f:eb:00:02:00:00:ab:11:c4:bd:ed:7a:ca:fd:92:30;
        client-hostname "photon-installer";
}
lease 172.16.54.129 {
        starts 3 2020/05/06 15:12:23;
        ends 3 2020/05/06 15:42:23;
        hardware ethernet 00:0c:29:08:05:f4;
        uid ff:b6:22:0f:eb:00:02:00:00:ab:11:44:a7:c6:60:a7:3d:45:17;
        client-hostname "photon";
}
pierrevillard commented 4 years ago

Hi, This problem is due to "systemd-networking"n not Packer Just add the following lines to your /etc/systemd/network/99-dhcp-en.network file will force dhcp client to use MAC and not "duid".

[DHCPv4]
ClientIdentifier=mac

cf: https://www.freedesktop.org/software/systemd/man/systemd.network.html

kclinden commented 4 years ago

@pierreilki - I tried that as well :( It still wanted to pick up a new DHCP record causing packer to get the wrong ip. I also tried setting the hostname of the system to match that of the installer. In my case, this was photon-installer.

arizvisa commented 4 years ago

Hey y'all. So I wrote some unit tests for the majority of the vmware builder parsers (#9303), and did some refactoring of the dhcpd lease parsers (#9319). The reason being is because it looks like this issue can be "kind of" solved in builder/vmware/common/ssh.go.

This CommHost function asks the driver.GuestIP function for what address to use. Before #9319, the driver.GuestIP function was just using regexes to grab any lease that matched the hw address. The issue that we're encountering (or at least that I am) is that there's more than one lease with the same hw address. The only thing that's different is the "uid" field which is what the dhcpcd in our guest is using to fetch the address. This should be what we're actually "key"-ing on, but there's no good way to export the "uid" from a guest. So, why not grab "everything" that matches, and try that?

I had to rewrite the dhcpd lease parsers so that it would first-of-all be easier to test, but so that it would not only parse the dhcpd leases... but be wayyy easier to extract more than one match, and on any particular field (uid in our case). This way in CommHost, we can ask driver.GuestIP which leases match, and check each one invidually to see what works.

There may be some issues with doing this that I don't see yet, but I broke up my intentions into separate stages so that it can be easier to review their individual modifications. PR #9319 should be completely backwards compatible with the way the dhcpd lease parser is currently working, and the next PR (which I'm going to start working on in a minute) will end up working in a non-backwards-compatible way due to changing the way that CommHost and driver.GuestIP interact.

arizvisa commented 4 years ago

Okay...I'm actually surprised it works, but PR #9322 reworks driver.GuestIP so that it returns a list of addresses using the new dhcpd.leases parser from PR #9319. Then in CommHost, it takes the list of addresses and tries each one until one of them is valid. That address is then used to ssh to the guest.

Now it'll take like a second or two for it to recognize that ssh is up, but packer seems to recognize the new address and continue to the next multistep like it's supposed to.

So.. PR #9322 should fix this..properly., and without hacking up your guest or VMware configuration.

akutz commented 4 years ago

I wanted to take a moment to thank you for all your hard work on this @arizvisa!

arizvisa commented 4 years ago

Thx. Anything for another austinite. ;)

praseodym commented 4 years ago

I think this was accidentally closed due to a “close” keyword in #9303, while that PR doesn’t actually fix this issue.

SwampDragons commented 4 years ago

Good catch, thanks.

SwampDragons commented 4 years ago

I think this was closed "for real" by PR 9322. We'll be releasing v1.6.0 early next week.

nywilken commented 4 years ago

Fixed confirmed using the configuration files linked above by @DanHam. Note I fixed the deprecation issue locally for the iso_checksum_type configuration attribute before running against v1.6.0-dev.

vmware-iso: output will be in this color.

==> vmware-iso: Retrieving ISO
==> vmware-iso: Trying https://cdimage.debian.org/cdimage/archive/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso                             
==> vmware-iso: Trying https://cdimage.debian.org/cdimage/archive/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso?checksum=sha512%3A5495c8378b829df7386b9bac5bc701f7ad8b2843d088e8636c89549519cf176100eacb90121af3934a8c5229cbe7d2fd23342eda330d56fb45fb2d91f2117fb4                             
==> vmware-iso: https://cdimage.debian.org/cdimage/archive/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso?checksum=sha512%3A5495c8378b829df7386b9bac5bc701f7ad8b2843d088e8636c89549519cf176100eacb90121af3934a8c5229cbe7d2fd23342eda330d56fb45fb2d91f2117fb4 => /home/wilken/pkg/packer-testing-master/vmware-dhcp/packer_cache/aa283600cd4c412a3090a9399e251328ffc7ccfa.iso                                                                       
==> vmware-iso: Creating required virtual machine disks
==> vmware-iso: Building and writing VMX file
==> vmware-iso: Starting HTTP server on port 8586
==> vmware-iso: Starting virtual machine...
==> vmware-iso: Waiting 5s for boot...
==> vmware-iso: Connecting to VM via VNC (127.0.0.1:5911)
==> vmware-iso: Typing the boot command over VNC...
==> vmware-iso: Waiting for SSH to become available...
==> vmware-iso: Connected to SSH!
==> vmware-iso: Provisioning with shell script: /tmp/packer-shell527574289                                                                        
==> vmware-iso: Running local shell script: /tmp/packer-shell536900249
    vmware-iso: 4bb8ee9c-5bfc-1977-6797-ed334dcbd96c
==> vmware-iso: Gracefully halting virtual machine...
    vmware-iso: Waiting for VMware to clean up after itself...
==> vmware-iso: Deleting unnecessary VMware files...
    vmware-iso: Deleting: output-debian-10-vmware-iso/vmware.log
==> vmware-iso: Compacting all attached virtual disks...
    vmware-iso: Compacting virtual disk 1
==> vmware-iso: Cleaning VMX prior to finishing up...
    vmware-iso: Detaching ISO from CD-ROM device ide0:0...
    vmware-iso: Disabling VNC server...
==> vmware-iso: Skipping export of virtual machine (export is allowed only for ESXi)...                                                           
Build 'vmware-iso' finished.
arizvisa commented 4 years ago

Since a list of hosts is being checked linearly depending on how many leases match (as opposed to before)... Is there a noticeable difference for y'all in the time it takes to detect the new address from the previous method? It likely doesn't really matter, but is it as significant for you guys as it is for me?

Also, is the vmware builder the only one which uses the method of parsing the dhcp leases in order to determine the address of the guest?

DanHam commented 4 years ago

@arizvisa Just like to second the thanks above for the fix! Really appreciated!

I have to say, I didn't really look time my build/watch it too much, but I didn't notice any significant delays. Things worked pretty much as they did before for me.

DanHam commented 4 years ago

@arizvisa A quick update on my comment above.

I've now built a few boxes with the latest Packer build. I've had mixed results with respect to the time it takes for the new address to be picked up by Packer.

Sometimes the address is picked up quickly. Other times there is a very noticeable delay - in the region of minutes - while the box is sitting there post reboot waiting for Packer to connect.

arizvisa commented 4 years ago

Hmm.. I wonder if performance can be improved slightly by sorting the list of leases that we parse in descending order (at the end of the PotentialGuestIP function) so that the newer leases are attempted to be connected to first... Another thing that might be worth trying is to make the connections in CommHost in parallel, as essentially the logic is the same as a portscanner.

Anyways, just some some potential solutions to consider. I'll leave this experimentation up to the maintainers or another contributor for the moment perhaps until I can find more time.

SwampDragons commented 4 years ago

Makes sense, thanks @arizvisa for getting it this far :)

arizvisa commented 4 years ago

course. i got u.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.