hashicorp / packer-plugin-vsphere

Packer plugin for VMware vSphere Builder
https://www.packer.io/docs/builders/vsphere
Mozilla Public License 2.0
94 stars 91 forks source link

Having the same issue as closed #127 #307

Closed cgriff80031 closed 5 months ago

cgriff80031 commented 11 months ago

I have tried every published workaround but have the same issue as https://github.com/hashicorp/packer-plugin-vsphere/issues/127

Nothing I have tried will keep the VM from pulling new IP on reboot.

I am on the latest releases of everything.

Is the bug that was added after issue 127 was closed addressing the upstream issue?

https://github.com/kubernetes-sigs/image-builder/issues/1236

Thanks in advance

tenthirtyam commented 11 months ago

What's in your user-data config?

cgriff80031 commented 11 months ago

thanks for taking a look. I am using the default from here:

https://github.com/vmware-samples/packer-examples-for-vsphere/blob/develop/builds/linux/ubuntu/22-04-lts/data/user-data.pkrtpl.hcl

With the only differences I added a couple of packages:

...

packages:

tenthirtyam commented 11 months ago

Interesting, because I can 100% say those work because I wrote those samples. 😆

tenthirtyam commented 11 months ago

Q: Is the packer host accessible from the machine being built?

cgriff80031 commented 11 months ago

Yes, I can log in and the user/pass that I set all works. The problem is the IP address changes after reboot. As an example IP was initially xx.xx.xx.21 and after the initial reboot xx.xx.xx.112 It will just sit there and time out waiting for SSH to become available. I did go in and manually set the IP address back to the OG and got the script to complete.

We use Palo Alto firewalls as the DHCP server for this segment. It is a default out-of-the-box config, default scope etc.

Thanks

On Thu, Oct 12, 2023 at 10:06 AM Ryan Johnson @.***> wrote:

Q: Is the packer host accessible from the machine being built?

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/packer-plugin-vsphere/issues/307#issuecomment-1759678679, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBXQ5CQH3GSJXB4CKH4BB3X672MPAVCNFSM6AAAAAA5Q3ATBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJZGY3TQNRXHE . You are receiving this because you authored the thread.Message ID: @.***>

-- Chris Griffis Phone 303-653-3096 @.***

tenthirtyam commented 11 months ago

try adding ip_settle_timeout.

https://developer.hashicorp.com/packer/integrations/hashicorp/vsphere/latest/components/builder/vsphere-iso

cgriff80031 commented 11 months ago

I added a 5 min ip_settle_timeout to packer-examples-for-vsphere/builds/linux/ubuntu/22-04-lts/linux-ubuntu.auto.pkrvars.hcl

... // Boot Settings vm_boot_order = "disk,cdrom" vm_boot_wait = "5s" ip_settle_timeout = "5m" ...

same behavior.

JollyJokr commented 9 months ago

Any updates on this matter? I have the same error/problem.

pv2b commented 8 months ago

I have the same issue as well.

In my case we're running a DHCP server that's integrated in a Palo Alto Networks NGFW. Reviewing the DHCP server logs finds that during the install process the IP address is leased and released several times. Every time a release / lease happens, the Palo Alto Networks DHCP server does not re-use the previous IP (because it was released) and instead just users the next IP in its rotation.

In total during my deployment, a total of 4 different IP addresses were assigned to the VM. The first 3 within the first few seconds. The third is kept for about 6 minutes and 2 seconds during the install, and then the 4th and final IP address is assigned after reboot.

In my case, an ip_settle_timeout of 5m would therefore not be sufficient. I would suggest anyone who runs into this problem to review their DHCP logs and figure out how long it actually takes for the initial install to finish and the IP to settle once the installed OS has actually booted, and add a healthy margin. In my case, I set it to 10 minutes.

However, that's still a very fragile workaround and not really a solution to the root problem. It's possible high load on the virtualization environment or configuration changes to the image build configuration changes the build time over time. I could set the timeout even higher, but then I'm adding useless wait time to the build process.

pv2b commented 8 months ago

OK, I figured out a really stupid workaround for this.

This is for building Ubuntu 22.04 LTS Server.

The problem is that the IP address changes after the installer finishes and then the server reboots into its main OS. The reason it changes is because a DHCP Release packet is sent from the Linux VM within the installer just before the reboot, therefore requiring an assignment of a new IP address after reboot.

My workaround for this is to prevent the installer from being able to perform a DHCP Release when rebooting. That way, once the installer has rebooted and the installed system is up, and therefore the next step is to run the SSH provisioner, the IP address will not has changed.

The way I accomplish this is to add the following to the user-data inside the late-commands section:

autoinstall:
    # ...
    late-commands:
        # Prevent DHCP release message from being sent on reboot
        - iptables -A OUTPUT -p udp --dport 67 -j DROP

This installs a firewall rule in the built-in iptables firewall in Linux which prevents any UDP outgoing packets with a destination port of 67, which is used by a DHCP server. Essentially we prevent the system from telling the DHCP server anything on reboot, which means the IP address is never successfully released, and therefore the same IP will be available then the VM boots back up again after install.

I'm doing nothing to persist this iptables rule which means it's gone after reboot.

I maintain that the correct solution should be for packer-plugin-vsphere to poll the Guest IP from vSphere on every SSH connection attempt, or some other mechanism be used to signal that the OS has successfully been installed. This is just a workaround.

Like this I don't need to set an ip_settle_timeout that's higher than normal. In my experience it'll still assign and release 2 useless IP's at the start, but that happens early enough that it doesn't matter.

tenthirtyam commented 8 months ago

Is the network device set as ens192 and is the nic set to vmxnet3?

pv2b commented 8 months ago

Is the network device set as ens192 and is the nic set to vmxnet3?

Yes, I'm running vmxnet3 and the NIC shows up as ens192 for me. I'm not doing any network configuration by packer, I'm just relying on the default DHCP behaviour.

tenthirtyam commented 8 months ago

What's the network config set in user-data?

pv2b commented 8 months ago

What's the network config set in user-data?

As I said, we're not doing any network configuration, we're relying on the default DHCP configuration.

Therefore the network section is not even present.

Here's my entire user-data file (with some secrets redacted):

#cloud-config
autoinstall:
    version: 1
    early-commands:
        # workaround to stop ssh for packer as it thinks it timed out
        - sudo systemctl stop ssh
    locale: en_US
    keyboard:
        layout: se
    packages: [open-vm-tools, openssh-server, net-tools, perl, open-iscsi, ntp, curl, vim, ifupdown, zip, unzip, gnupg2, software-properties-common, apt-transport-https, ca-certificates, lsb-release, python3-pip, jq, cloud-init]
    identity:
        hostname: ubuntu-server
        username: ubuntu
        password: "<redacted>"
    ssh:
        install-server: yes
        allow-pw: yes
        authorized-keys:
            - <redacted>
    storage:
        layout:
            name: direct
    user-data:
        disable_root: false
    late-commands:
        - echo 'ubuntu ALL=(ALL) NOPASSWD:ALL' > /target/etc/sudoers.d/ubuntu
        - curtin in-target --target=/target -- chmod 440 /etc/sudoers.d/ubuntu
        # Prevent DHCP release message from being sent on reboot
        - iptables -A OUTPUT -p udp --dport 67 -j DROP
tenthirtyam commented 8 months ago

That should work nicely as you have it. I use similar in the examples I publish. 🤔 💭

pv2b commented 8 months ago

That should work nicely as you have it. I use similar in the examples I publish. 🤔 💭

As I said in my earlier post, the difference is the DHCP server used and its behaviour.

Ubuntu will send a DHCPRELEASE server every time it reboots. This means that the IP address may change every time the server reboots. The Palo Alto firewall's DHCP server we use happens to implement things in a way that the same IP is never reused, while another DHCP server might instead sometimes or often re-assign different IP addresses thereby masking this bug.

This behavious has nothing to with packer. And it's not really incorrect by systemd-network's DHCP implementation, nor is the Palo Alto's DHCP server implementation incorrect either. The problem is that packer is assuming that the IP address won't change between the installation, and then the reboot into the installed OS. This assumption is false, and that's what's causing this problem.