josenk / vagrant-vmware-esxi

A Vagrant plugin that adds a vmware ESXi provider support.
GNU General Public License v3.0
415 stars 106 forks source link

network.service failed #40

Closed antonipx closed 6 years ago

antonipx commented 6 years ago

When creating a VM of box type generic/centos7 I'm getting following error message:

Bringing machine 'centos1' up with 'vmware_esxi' provider...
==> centos1: Box 'generic/centos7' could not be found. Attempting to find and install...
    centos1: Box Provider: vmware_esxi, vmware, vmware_desktop, vmware_fusion, vmware_workstation
    centos1: Box Version: >= 0
==> centos1: Loading metadata for box 'generic/centos7'
    centos1: URL: https://vagrantcloud.com/generic/centos7
==> centos1: Adding box 'generic/centos7' (v1.5.0) for provider: vmware_desktop
    centos1: Downloading: https://vagrantcloud.com/generic/boxes/centos7/versions/1.5.0/providers/vmware_desktop.box
==> centos1: Successfully added box 'generic/centos7' (v1.5.0) for 'vmware_desktop'!
==> centos1: Virtual Machine will be built.
VMware ovftool 4.3.0 (build-7948156)
==> centos1: --- WARNING         : esxi_virtual_network[0] not set, using VM Network
==> centos1: ---   --- ESXi Summary ---
==> centos1: --- ESXi host       : 70.0.0.67
==> centos1: --- Virtual Network : ["VM Network"]
==> centos1: --- Disk Store      : HDD
==> centos1: --- Resource Pool   : /
==> centos1: ---  --- Guest Summary ---
==> centos1: --- VM Name         : centos1
==> centos1: --- Box             : generic/centos7
==> centos1: --- Box Ver         : 1.5.0
==> centos1: --- Memsize (MB)    : 4096
==> centos1: --- CPUS            : 2
==> centos1: --- Boot Disk Size  : 50GB
==> centos1: --- Storage (GB)    : [100]
==> centos1: --- Guest OS type   : centos-64
==> centos1: ---   --- Guest Build ---
Opening VMX source: /home/as/.vagrant.d/boxes/generic-VAGRANTSLASH-centos7/1.5.0/vmware_desktop/ZZZZ_centos1.vmx
Opening VI target: vi://root@70.0.0.67:443/
Deploying to VI: vi://root@70.0.0.67:443/
Transfer Completed                    
Completed successfully
==> centos1: --- VMID            : 18
==> centos1: --- Extend Boot dsk : 50GB
==> centos1: --- Creating Storage: disk_0.vmdk (100GB)
==> centos1: --- VM has been Powered On...
==> centos1: --- Waiting for state "running"
==> centos1: --- Success, state is now "running"
==> centos1: Setting hostname...
    centos1: 
    centos1: Vagrant insecure key detected. Vagrant will automatically replace
    centos1: this with a newly generated keypair for better security.
    centos1: 
    centos1: Inserting generated public key within guest...
    centos1: Removing insecure key from the guest if it's present...
    centos1: Key inserted! Disconnecting and reconnecting using new SSH key...
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

# Update sysconfig
sed -i 's/\(HOSTNAME=\).*/\1centos1/' /etc/sysconfig/network

# Update DNS
sed -i 's/\(DHCP_HOSTNAME=\).*/\1"centos1"/' /etc/sysconfig/network-scripts/ifcfg-*

# Set the hostname - use hostnamectl if available
echo 'centos1' > /etc/hostname
if command -v hostnamectl; then
  hostnamectl set-hostname --static 'centos1'
  hostnamectl set-hostname --transient 'centos1'
else
  hostname -F /etc/hostname
fi

# Prepend ourselves to /etc/hosts
grep -w 'centos1' /etc/hosts || {
  sed -i'' '1i 127.0.0.1\tcentos1\tcentos1' /etc/hosts
}

# Restart network
service network restart

Stdout from the command:

/bin/hostnamectl
Restarting network (via systemctl):  [FAILED]

Stderr from the command:

Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.

Investigating further

journalctl -eu network

May 09 18:49:43 bazinga.localdomain network[865]: Bringing up loopback interface:  [  OK  ]
May 09 18:49:44 bazinga.localdomain network[865]: Bringing up interface ens33:  Error: Connection activation failed: No suitable dev
May 09 18:49:44 bazinga.localdomain network[865]: [FAILED]
May 09 18:49:44 bazinga.localdomain systemd[1]: network.service: control process exited, code=exited status=1
May 09 18:49:44 bazinga.localdomain systemd[1]: Failed to start LSB: Bring up/down networking.
May 09 18:49:44 bazinga.localdomain systemd[1]: Unit network.service entered failed state.
May 09 18:49:44 bazinga.localdomain systemd[1]: network.service failed.

Looking at the VM there is no ens33 interface but rather ens32 interface:

[root@centos1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:d2:89:48 brd ff:ff:ff:ff:ff:ff
    inet 70.0.178.114/16 brd 70.0.255.255 scope global dynamic ens32
       valid_lft 172158sec preferred_lft 172158sec

There is a relevant Vagrant issue concerning fusion provider: https://github.com/hashicorp/vagrant/issues/7533

They have a workaround where you can specify different PCI slot number, as such:

  config.vm.provider 'vmware_fusion' do |vf|
     vf.vmx['ethernet0.pcislotnumber'] = '33'
  end

I was able to do a similar workaround in your provider

    dom.vm.provider :vmware_esxi do |v|
        v.guest_custom_vmx_settings = [['ethernet0.pcislotnumber','33']]
    end

I was wondering if there was a way for a more permanent fix.

josenk commented 6 years ago

Thanks for pointing this out, but this is a very well known bug in the box. This is not a vmware-esxi provider bug.

Quote from hashicorp/vagrant#7533: The "service network restart" was introduced here: b91c167 but I can't say if something smart can be done to repair broken boxes ( file ifcfg-enp0s3, should'nt be here ... )

BTW: There is a couple of ways to avoid this problem.

1) Don't use broken boxes like generic/centos7 2) Don't set the hostname. (ie dont' use the 'config.vm.hostname' option in Vagrantfile. 3) Install Centos7 using any workaround method, rm -fr /etc/sysconfig/network-scripts/ifcfg-e*. Then repackage. Your new box should work fine. 4) Submit a bug report to the builder of generic/Centos7. 5) Submitting a bug report to Vagrant or any of the vmware providers (fusion or others) probably won't do much because this is a box bug.

johnoooo commented 6 years ago

Is there a recommendation for known "good" boxes?

generic/centos7 is the one that is the active one in the samples files.

tenox7 commented 6 years ago

you also recommend generic/ boxes right in Vagrantfile example in the readme

  # Here are some of the MANY examples....
  config.vm.box = 'generic/centos7'
  #config.vm.box = 'generic/centos6'
  #config.vm.box = 'generic/fedora27'
  #config.vm.box = 'generic/freebsd11'
  #config.vm.box = 'generic/ubuntu1710'
  #config.vm.box = 'generic/debian9'
  #config.vm.box = 'hashicorp/precise64'
  #config.vm.box = 'steveant/CentOS-7.0-1406-Minimal-x64'
  #config.vm.box = 'geerlingguy/centos7'
  #config.vm.box = 'geerlingguy/ubuntu1604'
  #config.vm.box = 'laravel/homestead'
  #config.vm.box = 'puphpet/debian75-x64'

Perhaps you could recommend something better that works with your plugin seamlessly?

Thanks!!

josenk commented 6 years ago

I don't really have a recommendation for the 'best' box... Each box contributor has their own recipe. (most minimal, most dev friendly, most standard, etc...) You would have to try various to see which you like best. In general, I do like the 'generic/' boxes except centos7 has that bug.

I can change my docs, but I think the root of this problem is not my docs... It's the buggy box. I didn't even write the code to modify the hostname. It's Hashicorp/Vagrant. It's best to file an issue with the makers of generic/centos7. Maybe even file an issue with Hashicorp/Vagrant to allow 'service network restart' to fail and continue...

josenk commented 6 years ago

re-opened by mistake...

tenox7 commented 6 years ago

It's very well understood that this a a bug in Vagrant and or some of the Boxes. However because VMs created by your awesome provider fall victims of it by nature of being VMware flavor, it would be nice for you to provide some recommendation, or at least mention a workaround in your main readme. Thank you!

johnoooo commented 6 years ago

I had some long trials yesterday. As I am looking at Centos, the box that I found working is "bento/centos-7.4". Not feasible is

My trials include a multi-machine setup with each machine having a second (static) IP address and the hostname configured via vagrant.

josenk commented 6 years ago

From the feedback I've received so far (my feedback link https://goo.gl/forms/tY14mE77HJvhNvjj1 ), I found most admins are creating their own boxes. It's the only way to really know to your infra...

josenk commented 6 years ago

V2.3.0 has been released. The default enabled example, 'generic/centos7' has been replaced by 'hashicorp/precise64'. I also added more detail in 'Known Issues' about setting hostnames on boxes.