hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.12k stars 3.33k forks source link

Debian changes IP address after reboot #3585

Closed balous closed 7 years ago

balous commented 8 years ago

I've got interesting problem with installed Debian obtaining a different IP address than it's installer.

I've got following scenario:

Problem is that the installer sends different DHCP request than the OS after installation. The difference is that the installer includes MAC address as client id but the OS does not. The server doesn't match both requests to be from the same client and issues another address. I was not able to convince the server to reissue the same address.

The result is that packer is never able to establish SSH connection to built VM.

I've got some more notes:

My idea is to change packer to detect the VM's IP address periodically during SSH connection polling. I know that the root cause is not in packer but rather in Debian's dhclient but changing packer this way would improve its interoperability with certain scenarios.

What do you think?

arizvisa commented 8 years ago

PR #3417, does proper detection of the IP and MAC address from VMware's dhcp leases and configuration for the VMware builder. In common/ssh.go, where it fetches the IP address via .GuestIP(), it checks the dhcp leases and pulls the latest address that was offered by the dhcpd. I suspect this might fix the issue. If it doesn't, let me know and I can update that PR.

balous commented 8 years ago

Well, Arizvisa, whank you for my first experience with go and its infrastructure :-).

First, I've followed packer's build instructions and built master. Than I've switched to your vmware-iso-extraconfig branch and build failed. So I've cherry-picked first four commits from that branch (the fifth failed to apply) and built a binary.

Than I've tried to build the VM - unsuccessfully. The VM failed to power on. The VM even can't be started from VMware console, the error message is:

An error was received from the ESX host while powering on VM debian-jessie-2016-06-01-08-32-15.
Remote connection failure
Failed to establish transport connection (9): There is no VMware process running for config file /vmfs/volumes/52456766-75d029d0-2554-6805ca1a2f69/debian-jessie-2016-06-01-08-32-15/debian-jessie-2016-06-01-08-32-15.vmx.
Disconnected from virtual machine.

So the short answer seems to be: no, it doesn't work.

I've used original packer (0.8.6) to do the same and made it halt after VM registration, than created a diff on VMX files:

--- debian-jessie-2016-06-01-08-32-15.vmx
+++ ../debian-jessie-2016-06-01-08-42-20/debian-jessie-2016-06-01-08-42-20.vmx
@@ -3,19 +3,19 @@
 checkpoint.vmstate = ""
 cleanshutdown = "TRUE"
 config.version = "8"
-displayname = "debian-jessie-2016-06-01-08-32-15"
+displayname = "debian-jessie-2016-06-01-08-42-20"
 ehci.pcislotnumber = "34"
-ehci.present = "FALSE"
+ehci.present = "TRUE"
 ethernet0.addresstype = "vpx"
 ethernet0.bsdname = "en0"
-ethernet0.connectiontype = "bridged"
+ethernet0.connectiontype = "nat"
 ethernet0.displayname = "Ethernet"
 ethernet0.linkstatepropagation.enable = "FALSE"
 ethernet0.networkname = "VLAN 353 - test PLZ"
 ethernet0.pcislotnumber = "33"
 ethernet0.present = "TRUE"
 ethernet0.virtualdev = "e1000"
-extendedconfigfile = "debian-jessie-2016-06-01-08-32-15.vmxf"
+extendedconfigfile = "debian-jessie-2016-06-01-08-42-20.vmxf"
 floppy0.present = "FALSE"
 guestos = "debian6-64"
 gui.fullscreenatpoweron = "FALSE"
@@ -23,13 +23,13 @@
 hgfs.linkrootshare = "TRUE"
 hgfs.maprootshare = "TRUE"
 ide1:0.devicetype = "cdrom-image"
-ide1:0.filename = "/vmfs/volumes/plz-esxi3_datastore1/packer_cache/17edc137b5abce9934d9dd6fbf02bbef4db867d32618650aee9d797bd8de98ed.iso"
+ide1:0.filename = "/vmfs/volumes/plz-esxi3_datastore1/packer_cache/4534b1c453e5c6b83eebcb3e835623b2ce2b171b310d2de897eff938889d2ac1.iso"
 ide1:0.present = "TRUE"
 isolation.tools.hgfs.disable = "FALSE"
 memsize = "1024"
 msg.autoanswer = "true"
 numvcpus = "2"
-nvram = "debian-jessie-2016-06-01-08-32-15.nvram"
+nvram = "debian-jessie-2016-06-01-08-42-20.nvram"
 pcibridge0.pcislotnumber = "17"
 pcibridge0.present = "TRUE"
 pcibridge4.functions = "8"
@@ -54,8 +54,7 @@
 powertype.suspend = "soft"
 proxyapps.publishtohost = "FALSE"
 remotedisplay.vnc.enabled = "TRUE"
-remotedisplay.vnc.ip = "plz-esxi3.samepage.in"
-remotedisplay.vnc.port = "5901"
+remotedisplay.vnc.port = "5900"
 replay.filename = ""
 replay.supported = "FALSE"
 scsi0.pcislotnumber = "16"
@@ -76,10 +75,6 @@
 vmci0.pcislotnumber = "35"
 vmci0.present = "TRUE"
 vmotion.checkpointfbsize = "65536000"
-scsi0:0.deviceType = "scsi-hardDisk"
-sched.swap.derivedName = "/vmfs/volumes/52456766-75d029d0-2554-6805ca1a2f69/debian-jessie-2016-06-01-08-32-15/debian-jessie-2016-06-01-08-32-15-06696ffb.vswp"
-uuid.bios = "42 32 d4 bd 10 1a 32 2e-b7 1a 4a fa 6e c7 c2 b8"
-uuid.location = "56 4d a3 d5 d1 b9 99 3c-a0 33 74 69 b9 8e bf 1a"
-migrate.hostlog = "./debian-jessie-2016-06-01-08-32-15-06696ffb.hlog"
-ethernet0.generatedAddress = "00:50:56:b2:cf:e2"
-vc.uuid = "50 32 fc 39 47 12 59 0a-8b 5b f6 ad 16 1d 7c bc"
+ethernet0.generatedAddress = "00:50:56:b2:68:4a"
+uuid.bios = "42 32 03 5b ca a2 45 49-4a ac 2c f2 be cc ed 91"
+vc.uuid = "50 32 14 db ba 19 cb 6f-1e 66 2d ab b4 1d 1f 45"

I've also looked into vmware.log file, it ends with following:

2016-06-01T06:38:29.691Z| mks| I120: VTHREAD start thread 2 "mks" pid 429410
2016-06-01T06:38:29.691Z| mks| I120: MKS thread is alive
2016-06-01T06:38:29.691Z| mks| I120: MKS-RenderMain: RenderMain: PowerOn allowed 0 0 0 1 0
2016-06-01T06:38:29.691Z| mks| I120: MKS-RenderMain: Starting MKSBasicOps
2016-06-01T06:38:29.691Z| mks| I120: KHBKL: Unable to parse keystring at: ''
2016-06-01T06:38:29.691Z| mks| I120: MKS-RemoteMgr: Set default display name: debian-jessie-2016-06-01-08-32-15
2016-06-01T06:38:29.691Z| mks| I120: MKS-RemoteMgr: Loading VNC Configuration from VM config file
2016-06-01T06:38:29.691Z| mks| I120: bora/lib/string/str.c:286 Buffer too small
2016-06-01T06:38:32.240Z| mks| W110: A core file is available in "/var/core/vmx-zdump.001"
2016-06-01T06:38:32.240Z| mks| I120: Panic: monitor not available, skipping monitor coredump.
2016-06-01T06:38:32.240Z| mks| I120: Backtrace:
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[0] 000003ffe83a9370 rip=0000000000b4d04e rbx=0000000000b4cb30 rbp=000003ffe83a9390 r12=0000000000000000 r13=0000000000000010 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[1] 000003ffe83a93a0 rip=00000000005e76ea rbx=000000000181b3e8 rbp=000003ffe83a9890 r12=0000000000000000 r13=0000000000000010 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[2] 000003ffe83a98a0 rip=0000000000b19820 rbx=000003ffe83aa994 rbp=000003ffe83a98c0 r12=000000003259e700 r13=0000000000000010 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[3] 000003ffe83a98d0 rip=00000000008755ea rbx=000003ffe83aa990 rbp=000003ffe83aba70 r12=000003ffe83a98e0 r13=000000003259e700 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[4] 000003ffe83aba80 rip=0000000000875add rbx=00000000018e5cc0 rbp=000003ffe83abb50 r12=00000000018e59a0 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[5] 000003ffe83abb60 rip=00000000008654c5 rbx=00000000018e5cc0 rbp=000003ffe83abc40 r12=00000000018e59a0 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[6] 000003ffe83abc50 rip=00000000008794d3 rbx=00000000018e59a0 rbp=000003ffe83abc60 r12=0000000032561660 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[7] 000003ffe83abc70 rip=0000000000815e78 rbx=00000000018c5510 rbp=000003ffe83abda0 r12=0000000032561660 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[8] 000003ffe83abdb0 rip=0000000001dc3ddc rbx=0000000000000000 rbp=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[9] 000003ffe83abec0 rip=0000000002c0a12d rbx=0000000000000000 rbp=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: Backtrace[10] 000003ffe83abec8 rip=0000000000000000 rbx=0000000000000000 rbp=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[0] 000003ffe83a9370 rip=0000000000b4d04e in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[1] 000003ffe83a93a0 rip=00000000005e76ea in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[2] 000003ffe83a98a0 rip=0000000000b19820 in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[3] 000003ffe83a98d0 rip=00000000008755ea in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[4] 000003ffe83aba80 rip=0000000000875add in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[5] 000003ffe83abb60 rip=00000000008654c5 in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[6] 000003ffe83abc50 rip=00000000008794d3 in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[7] 000003ffe83abc70 rip=0000000000815e78 in function (null) in object /bin/vmx loaded at 0000000000448000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[8] 000003ffe83abdb0 rip=0000000001dc3ddc in function (null) in object /lib64/libpthread.so.0 loaded at 0000000001dbc000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[9] 000003ffe83abec0 rip=0000000002c0a12d in function clone in object /lib64/libc.so.6 loaded at 0000000002b37000
2016-06-01T06:38:32.240Z| mks| I120: SymBacktrace[10] 000003ffe83abec8 rip=0000000000000000 
2016-06-01T06:38:32.240Z| mks| I120: Exiting

After removing 'remotedisplay.vnc.ip', the VM powered successfully.

arizvisa commented 8 years ago

Sorry, I don't have an ESXi instance to test and it didn't click with me that it was ESX when I read your post. All my patchset does is parses VMware's configuration and exposes VMware's network to your packer template so it'll use the proper address as per what VMware's configuration says instead of making assumptions about the network type. ESX actually does this differently, so my patches don't actually affect it. ;-) Sorry about that.

Regardless, I'll see if I can help anyways: Is that vmware.log suggesting that it's dumping a corefile? like ESXi is segfaulting when it's trying to run VNC? Was that the error that you were having before or just with the older version of packer 0.8.6?

balous commented 8 years ago

ESXi is probably not saving the core file somewhere but it seems that the VM process crashes (until I remove that 'remotedisplay.vnc.ip' line from the VMX file). This theory is supported by the GUI error message reading that 'There is no VMware process running' :-). When I use older packer version (0.10.1 or 0.8.6) the problem goes away.

But I think the problem is elsewhere - packer detects the IP address once at the beginning (while the installer runs) and doesn't follow the change when installed Debian requests different lease and gets different address.

arizvisa commented 8 years ago

Does the contents of remotedisplay.vnc.ip even matter? i.e. can you type in any name and ESXi will crash? Because even if packer incorrectly detects the ip address... ESXi is crashing. So, that means there's not even a way to fix or even troubleshoot your original problem.

Is adding the remotedisplay.vnc.ip variable in the .vmx file the incorrect way to enable vnc on an ESXi instance?

tsugliani commented 8 years ago

@balous I encountered the same exact issue. (same DHCP server)

I've nailed that debian installer, and debian default boot generates a different DHCP request which usually results into 2 different IP given for the same MAC Addr :-( (and it seems Packer doesn't expect that, once it gets the IP for the first time, (installer), it caches the IP that is then reused when the debian is normally booting after the install process (firstboot) and at that moment, the ip is different which causes packer to fail.

My testing process

Here is the dump of the edgemax (my router) after the full installation completes:

admin@EdgeMax:~$ show dhcp leases
IP address Hardware Address Lease expiration Pool Client Name
---------- ---------------- ---------------- ---- -----------
192.168.0.51 00:0c:29:df:b1:39 2016/04/26 21:17:17 LAN1
192.168.0.52 00:0c:29:df:b1:39 2016/04/26 21:22:16 LAN1

As you can see, 2 different IP for the same MAC, and leases were given for a day, so that wasn't the pb (the copy/paste is old, don't take into account those dates with the date of this comment)

Here is the DHCP dump (dhcpdump) utility from the install (netinstall) and then normal boot process of the debian in DHCP.

1) During the NETBOOT installation

TIME: 2016-04-25 23:17:12.350
IP: 0.0.0.0 (0:c:29:df:b1:39) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 1 (BOOTPREQUEST)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: b0c5ce32
SECS: 0
FLAGS: 0
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 00:0c:29:df:b1:39:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: .
OPTION: 53 ( 1) DHCP message type 1 (DHCPDISCOVER)
OPTION: 61 ( 7) Client-identifier 01:00:0c:29:df:b1:39
OPTION: 57 ( 2) Maximum DHCP message size 576
OPTION: 55 ( 7) Parameter Request List 1 (Subnet mask)
3 (Routers)
6 (DNS server)
12 (Host name)
15 (Domainname)
28 (Broadcast address)
42 (NTP servers)

OPTION: 60 ( 3) Vendor class identifier d-i
---------------------------------------------------------------------------

TIME: 2016-04-25 23:17:13.354
IP: 0.0.0.0 (0:c:29:df:b1:39) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 1 (BOOTPREQUEST)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: b0c5ce32
SECS: 1
FLAGS: 0
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 00:0c:29:df:b1:39:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: .
OPTION: 53 ( 1) DHCP message type 3 (DHCPREQUEST)
OPTION: 61 ( 7) Client-identifier 01:00:0c:29:df:b1:39
OPTION: 50 ( 4) Request IP address 192.168.0.51
OPTION: 54 ( 4) Server identifier 192.168.0.1
OPTION: 57 ( 2) Maximum DHCP message size 576
OPTION: 55 ( 7) Parameter Request List 1 (Subnet mask)
3 (Routers)
6 (DNS server)
12 (Host name)
15 (Domainname)
28 (Broadcast address)
42 (NTP servers)

OPTION: 60 ( 3) Vendor class identifier d-i
---------------------------------------------------------------------------

2) AFTER THE INSTALL normal Debian boot

TIME: 2016-04-25 23:22:11.112
IP: 0.0.0.0 (0:c:29:df:b1:39) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 1 (BOOTPREQUEST)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: 1fe3bb44
SECS: 0
FLAGS: 0
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 00:0c:29:df:b1:39:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: .
OPTION: 53 ( 1) DHCP message type 1 (DHCPDISCOVER)
OPTION: 55 ( 7) Parameter Request List 1 (Subnet mask)
28 (Broadcast address)
2 (Time offset)
3 (Routers)
15 (Domainname)
6 (DNS server)
12 (Host name)

---------------------------------------------------------------------------

TIME: 2016-04-25 23:22:12.118
IP: 0.0.0.0 (0:c:29:df:b1:39) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 1 (BOOTPREQUEST)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: 1fe3bb44
SECS: 0
FLAGS: 0
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 00:0c:29:df:b1:39:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: .
OPTION: 53 ( 1) DHCP message type 3 (DHCPREQUEST)
OPTION: 54 ( 4) Server identifier 192.168.0.1
OPTION: 50 ( 4) Request IP address 192.168.0.52
OPTION: 55 ( 7) Parameter Request List 1 (Subnet mask)
28 (Broadcast address)
2 (Time offset)
3 (Routers)
15 (Domainname)
6 (DNS server)
12 (Host name)

---------------------------------------------------------------------------

This is causing me a lot of problems because the install tool I'm using (Packer) is leveraging the first IP address the DHCP server gets, and once the debian is installed, the same request (dhcprequest with same mac addr) gets another ip.

My current workaround for now is forcing the dhcprequest to be done in the same way during installation and at boot time in the preseed "kickstart (preseed.cfg) :

d-i preseed/late_command string \
  echo "send dhcp-client-identifier 1:$(cat /sys/class/net/eth0/address);" > /target/etc/dhcp/dhclient.conf;

@balous can you try this workaround ? I'm pretty sure packer should update the IP, but for now this works for me.

rickard-von-essen commented 8 years ago

I'm sure this is simliar to issue #1746 which was fixed (for Parallels) in PR #1760.

It would be interesting to see the out put of the commands run by driver_esx5.go esxcli --formatter csv network vm list at different changes.

The test could just be to drop the lines 238-240 of driver_esx5.go so the driver will recheck the VM's IP each time it is called instead of always returning the first address that it got.

tsugliani commented 8 years ago

@rickard-von-essen Yes i've tested that during my troubleshooting:

esxcli --formatter csv network vm list actually showed clearly the change in IP address, I just wasn't sure where to try that change in the packer / esxi code.

I'll try that in local build of packer. Thanks for the information, If it works I'll report here.

balous commented 8 years ago

@rickard-von-essen I've tried to comment out those 3 lines of code and I can confirm that the problem has gone :-).

tsugliani commented 8 years ago

Oh yes sorry, I completely forgot to report back, works for me too.