ruzickap / packer-templates

Scripts and Templates used for generating Vagrant images
Apache License 2.0
443 stars 113 forks source link

Windows 10 (libvirt provider): hanging at 'Waiting for SSH to become available...' #132

Closed gblewis1 closed 4 years ago

gblewis1 commented 4 years ago

Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.box = "peru/windows-10-enterprise-x64-eval"
end

Then vagrant up seems to work until ==> default: Waiting for ssh to become available...

At this point, even after waiting 30 minutes, nothing happens. Running winrm quickconfig on the machine via virt-manager shows that WinRM is running, but warns that the WinRM firewall exception will not work because one of the network connection types is set to Public.

Manually setting the network type to Private makes winrm quickconfig no longer complain. I'm not sure how to test further.

System info:

vagrant up startup messages:

Bringing machine 'default' up with 'libvirt' provider...
==> default: Checking if box 'peru/windows-10-enterprise-x64-eval' version '20200707.01' is up to date...
==> default: Creating image (snapshot of base box volume).
==> default: Creating domain with the following settings...
==> default:  -- Name:              windows-vagrant_default
==> default:  -- Domain type:       kvm
==> default:  -- Cpus:              1
==> default:  -- Feature:           acpi
==> default:  -- Feature:           apic
==> default:  -- Feature:           pae
==> default:  -- Feature (HyperV):  name=relaxed, state=on
==> default:  -- Feature (HyperV):  name=stimer, state=on
==> default:  -- Feature (HyperV):  name=synic, state=on
==> default:  -- Feature (HyperV):  name=vapic, state=on
==> default:  -- Memory:            2048M
==> default:  -- Management MAC:
==> default:  -- Loader:
==> default:  -- Nvram:
==> default:  -- Base box:          peru/windows-10-enterprise-x64-eval
==> default:  -- Storage pool:      default
==> default:  -- Image:             /var/lib/libvirt/images/windows-vagrant_default.img (50G)
==> default:  -- Volume Cache:      default
==> default:  -- Kernel:
==> default:  -- Initrd:
==> default:  -- Graphics Type:     spice
==> default:  -- Graphics Port:     -1
==> default:  -- Graphics IP:       127.0.0.1
==> default:  -- Graphics Password: Not defined
==> default:  -- Video Type:        qxl
==> default:  -- Video VRAM:        9216
==> default:  -- Sound Type:    ich6
==> default:  -- Keymap:            en-us
==> default:  -- TPM Path:
==> default:  -- INPUT:             type=mouse, bus=ps2
==> default:  -- CHANNEL:             type=spicevmc, mode=
==> default:  -- CHANNEL:             target_type=virtio, target_name=com.redhat.spice.0
==> default:  -- CHANNEL:             type=unix, mode=
==> default:  -- CHANNEL:             target_type=virtio, target_name=org.qemu.guest_agent.0
==> default:  -- RNG device model:  random
==> default: Creating shared folders metadata...
==> default: Starting domain.
==> default: Waiting for domain to get an IP address...
==> default: Waiting for SSH to become available...
ruzickap commented 4 years ago

Hi.

I tried to reproduce the issue, but it seems to be working working fine for me:

$ grep PRETTY_NAME /etc/os-release
PRETTY_NAME="Ubuntu 20.04 LTS"

$ dpkg -l | grep vagrant
ii  vagrant                              1:2.2.9                               amd64        no description given

$ vagrant --version
Vagrant 2.2.9

$ vagrant plugin list
vagrant-libvirt (0.1.2, global)

$ vagrant up
Bringing machine 'default' up with 'libvirt' provider...
==> default: Checking if box 'peru/windows-10-enterprise-x64-eval' version '20200707.01' is up to date...
==> default: Creating image (snapshot of base box volume).
==> default: Creating domain with the following settings...
==> default:  -- Name:              windows-10-enterprise-x64-eval_default
==> default:  -- Domain type:       kvm
==> default:  -- Cpus:              1
==> default:  -- Feature:           acpi
==> default:  -- Feature:           apic
==> default:  -- Feature:           pae
==> default:  -- Feature (HyperV):  name=relaxed, state=off
==> default:  -- Feature (HyperV):  name=stimer, state=off
==> default:  -- Feature (HyperV):  name=synic, state=off
==> default:  -- Feature (HyperV):  name=vapic, state=off
==> default:  -- Memory:            2048M
==> default:  -- Management MAC:
==> default:  -- Loader:
==> default:  -- Nvram:
==> default:  -- Base box:          peru/windows-10-enterprise-x64-eval
==> default:  -- Storage pool:      default
==> default:  -- Image:             /var/lib/libvirt/images/windows-10-enterprise-x64-eval_default.img (50G)
==> default:  -- Volume Cache:      default
==> default:  -- Kernel:
==> default:  -- Initrd:
==> default:  -- Graphics Type:     spice
==> default:  -- Graphics Port:     -1
==> default:  -- Graphics IP:       127.0.0.1
==> default:  -- Graphics Password: Not defined
==> default:  -- Video Type:        qxl
==> default:  -- Video VRAM:        9216
==> default:  -- Sound Type:    ich6
==> default:  -- Keymap:            en-us
==> default:  -- TPM Path:
==> default:  -- INPUT:             type=mouse, bus=ps2
==> default:  -- CHANNEL:             type=spicevmc, mode=
==> default:  -- CHANNEL:             target_type=virtio, target_name=com.redhat.spice.0
==> default:  -- CHANNEL:             type=unix, mode=
==> default:  -- CHANNEL:             target_type=virtio, target_name=org.qemu.guest_agent.0
==> default:  -- RNG device model:  random
==> default: Creating shared folders metadata...
==> default: Starting domain.
==> default: Waiting for domain to get an IP address...
==> default: Waiting for SSH to become available...
==> default: Forwarding ports...
==> default: 3389 (guest) => 3389 (host) (adapter eth0)
==> default: 5986 (guest) => 5986 (host) (adapter eth0)
==> default: 5985 (guest) => 5985 (host) (adapter eth0)

(I turned off the HyperV virtualization features, because my laptop's processor doesn't support these features)

Anyway, if you want to run the Windows with WinRM you need to use the Vagrant from upstream. Only upstream Vagrant versions supports the Ruby WinRM.

Here are the installation steps in Ansible:

https://github.com/ruzickap/packer-templates/blob/master/tools/create_remote_build_server/build_remote_ssh_ubuntu.yml#L125-L136

Check the following files

$ dpkg -L vagrant|grep -i winrm | wc -l
133

$ find /opt/vagrant/embedded/gems/2.2.9/gems/winrm-2.3.4
...
...many winrm files...
...

$ find /opt/vagrant/embedded/gems/2.2.9/gems/winrm-2.3.4 | wc -l
65

Please let me know if you are using Vagrant form upstream or Vagrant form Ubuntu.

Thank you...

gblewis1 commented 4 years ago

Thank you for the help! I'm using Vagrant from upstream. The winrm files were there when I ran the dpkg and find commands.

To try to match your environment more closely, I removed the vagrant-share plugin. I also ran sudo ufw disable to eliminate that variable. Then I tried again with vagrant up. Unfortunately, the desktop still came up without Vagrant seeing SSH. I then tried setting the network in the VM to Private and had the VM reboot, then logged in as vagrant. Still no change on the Vagrant side waiting for SSH.

In case it helps, here's my ip addr output with VM running and waiting for SSH:

1024: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:ff:91:98 brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr1
       valid_lft forever preferred_lft forever
1025: virbr1-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr1 state DOWN group default qlen 1000
    link/ether 52:54:00:ff:91:98 brd ff:ff:ff:ff:ff:ff
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether e0:d5:5e:66:c3:9c brd ff:ff:ff:ff:ff:ff
    inet 10.72.4.199/24 brd 10.72.4.255 scope global dynamic enp0s31f6
       valid_lft 182026sec preferred_lft 182026sec
    inet6 fe80::e2d5:5eff:fe66:c39c/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:7f:86:95:cd brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:7fff:fe86:95cd/64 scope link
       valid_lft forever preferred_lft forever
3738: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:58:4b:5e brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
3743: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:06:fa:c0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe06:fac0/64 scope link
       valid_lft forever preferred_lft forever
1011: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:58:4b:5e brd ff:ff:ff:ff:ff:ff

The only major difference I can think of is that you're running Ubuntu 20.04 and I'm running 18.04.

ruzickap commented 4 years ago

Hmm... That's strange.

I installed my environment (laptop) using the ubuntu-20.04-live-server-amd64.iso and then run the ansible playbook (https://github.com/ruzickap/packer-templates/tree/master/tools/create_remote_build_server). It's just few commands to install Libvirt+Vagrant if you want to do it manually without Ansible.

Is there a chance, that you will install fresh Ubuntu 20.04 Server and try it (like I did)? (I can write some steps how to do it if you want)

I'm not sure if the problem is in the OS + Vagrant + libvirt + ... configuration or if it is in the image (it seems like the image is working fine).

Debugging the existing Ubuntu 18.04 environment may be quite hard and I'm not sure if I can help with it due to lack of deep knowledge of all the components.

If you want to try to reinstall your host from scratch using Ubuntu 20.04, than I can guide you, but I can not help you with teleshopping your existing machine :-(

gblewis1 commented 4 years ago

@ruzickap Unfortunately for this build agent I need to keep it on 18.04.. but I will see if I can reproduce this on 20.04 on a different machine. For now I've discovered that the virtualbox provider works fine on 18.04 so that solves my immediate problem. I'll get back to you once I've tried 20.04.

gblewis1 commented 4 years ago

@ruzickap Works perfectly on a fresh 20.04 server install.

gblewis1 commented 4 years ago

EDIT: it worked on 18.04, see next comment

I installed Ubuntu 18.04 Server just now on the same machine, and set it up with the same steps (except possibly a package name is different on 18.04) and encountered the "waits forever" issue.

gblewis1 commented 4 years ago

Scratch that, it just connected; it just took longer on this machine than I'm used to. Strange. This means you probably wouldn't be able to reproduce the issue. I'll try reinstalling my original machine and see if I can narrow down the issue. Thanks again for the help!

ruzickap commented 4 years ago

I'm glad you make it working in Ubuntu 20.04. Maybe it's necessary to some additional steps in Ubuntu 18.04, which I do not remember. Sooner or later people will migrate their workloads to latest Ubuntu LTS (20.04).

Anyway - if you have working 20.04 and non-working 18.04 you can try to compare these two. -> If you find out the cause why it is not working in 18.04, please put the results of your observation to this ticket if you have time.

First start takes quite a lot of time for all Windows boxes which are "prepared/cleaned" by sysprep. But it's only the first start. Long time ago I built images without sysprep which started really fast, but also have some "hardcoded" identifiers which caused privacy issues.

Take care...

dragon788 commented 4 years ago

The peru box may not have had this line which forces all attached interfaces to Private so that the WinRM quickconfig can happen.

This seems to have broken with one of the recent Win10 releases as it wasn't an issue with 1703 but started maybe around 1803 or thereabouts.

https://github.com/chef/bento/pull/1255/files#diff-223395abbd3bc3e999f3c3d735cf982bR11

ruzickap commented 4 years ago

I'm not sure, because I'm not very skilled windows user, but this script is probably doing something similar: https://github.com/ruzickap/packer-templates/blob/2dbb8e24af8494b73e36e2fc742782830975a80a/scripts/win-common/fixnetwork.ps1

-> It's called by the Autounattend.xml when the box is being built.

Not sure if this is the same but it should be working. I tested the windows boxes and they should be running fine for libvirt + virtualbox.

If not - please let me know what is wrong and I can look at it.

Any improvements are welcome :-)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.