saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.15k stars 5.48k forks source link

Creating a VM from template does not connect NIC (modified or new) - salt-cloud vmware provider #23270

Closed indispeq closed 9 years ago

indispeq commented 9 years ago

Hi! Thank you for the great work on this cloud provider.

I am running esxi5.5 standard, but I am getting an issue after creating my VM, both with static, or a dhcp IP:

template: basic ubuntu 14.04 LTS install converted into template vmtools: tried with vmware-tools as well as the recommended open-vm-tools

I noticed that:

               connectable = (vim.vm.device.VirtualDevice.ConnectInfo) {
                  dynamicType = <unset>,
                  dynamicProperty = (vmodl.DynamicProperty) [],
                  startConnected = true,
                  allowGuestControl = true,
                  connected = false,
                  status = <unset>

has connected = false , this I could change in the vmware.py file but even setting it to true and having the output show it did not actually check the connected box on the VM cloned from the template.

My cloud.profiles config:

vmware-ubuntu-14.04-public:
  provider: my-vsphere
  clonefrom: ubuntu2

  cluster: cluster1.

  datastore: Datastore1

  datacenter: Home

  power_on: True

  devices:
    network:
      Network adapter 1:
        name: INTERNET
        ip: 1.2.3.30
        gateway: [ 1.2.3.17 ]
        subnet_mask: 255.255.255.240
        domain: cloud.com
        type: vmxnet3

  dns_servers:
    - 8.8.8.8
    - 8.8.4.4
  domain: cloud.com

  deploy: True
  ssh_username: ubuntu
  private_key: /root/.ssh/id_rsa

Please let me know what else you'd need to troubleshoot this. Thanks!

nmadhok commented 9 years ago

@indispeq I made some changes, added functionality to distinguish between standard switch networks and distributed portgroups and fixed some bugs in #23251 and #23266. Would it be possible for you to try again using the latest vmware.py from salt develop? Also, do note that your network adapter should have an additional field called switch_type which determines if it is using a standard switch network or a distributed port group. Set switch_type: standard or switch_type: distributed depending upon what switch type your network uses. Also, you will need to rename the type field under your network adapter to adapter_type. Check the docs for updated information.

Make sure in your template, you have the RHEL/CentOS equivalent of /etc/sysconfig/network present with the following content:

NETWORKING=yes
HOSTNAME=localhost.localdomain
PERSIST=yes

I fixed the problem you were having by modifying the template with that and I didn't have to manually do ifconfig eth0 up to bring it up.

In the end, make sure vmware tools are installed on your template and that you can see the IP in the vsphere client after it get's deployed. It usually takes about 90 - 150 seconds to get the IP information. Please try to deploy it again in -l debug mode and paste the output so I can figure out if there's something else going wrong.

rallytime commented 9 years ago

Thanks for this report @indispeq and for the response @nmadhok.

@indispeq Can you give the suggestions above a try and let us know how it goes?

nmadhok commented 9 years ago

@indispeq Can you also change adapter_type for the NIC from vmxnet3 to e1000 and try again??

nmadhok commented 9 years ago

@indispeq @syphernl Can i get an update from you both on this? Did you try the suggestions?

I created a template for Ubuntu 14.04 and deployed it using the same profile that you both had and I didn't not experience any problem. The configuration was created properly in /etc/network/interfaces and the NIC was connected after boot. Once thing that's different is that I didn't have any NIC on the template and I had VMware tools installed instead of open-vm-tools.

As long as the IP is visible in vcenter, salt-cloud should be able to get the IP and use it to ssh.

indispeq commented 9 years ago

Hi @nmadhok . I tested for a few hours yesterday night and your suggestions seem to have worked. I think my issue was with the template and the tools. I did it with a NIC in the template: After making a symlink /etc/dhcp3 to /etc/dhcp in the template. (known issue in ubuntu 14.04 as per the vmware kb)

Open-vm-tools does not work. This is a problem in my opinion because at install time of vmware tools you get a prompt to use open-vm-tools instead. I will need to try with a CentOS template and also update my vcenter to the latest version. It does take about 7-8 minutes for the VM to provision and failures seem to come from the fact that the guest IP never shows up in the vsphere client, so I assume not for the salt-cloud driver either.

Thank you for the fixes, this is already very usable!

nmadhok commented 9 years ago

@indispeq Thank you for the confirmation! I will also try using open-vm-tools and see if I can reproduce the same error that you're having. It probably takes 7-8 minutes because your template might be doing updates and installing vmware tools/open-vm-tools otherwise if you specify no when you install vmware-tools when it asks you if you want to do automatic update, the deployment and installation of salt will be very quick (under 2 minutes).

Glad to know this issue has been fixed for you! Please mark the issue as closed if you don't have any more questions! Thanks for testing and reporting the issue!

syphernl commented 9 years ago

What is odd is the fact that if I manually clone a VM (with open-vm-tools & no symlinks for dhcp) the IP does show up pretty fast. But same can be said about the fact that the "connect on boot" checkmark remains checked in the cloned machine (same as the template)

I'll test it with no NIC, "vmware tools" and the symlink..

syphernl commented 9 years ago

Yep, with the above changes it does appear to work just fine. Although its not preferable to have the "legacy" vmware tools.

It took "only" 140s to get the IP by the way.

nmadhok commented 9 years ago

@syphernl Thanks for confirming. You can still have NIC on your template and get rid of the symlink. I think the problem arises only when you're using open-vm-tools instead of vmware tools. I will look into this and see if it can be fixed from our end.

Would it be possible for you to share your template with me? This would save me some time instead of creating a new template and install open-vm-tools on it.

syphernl commented 9 years ago

@nmadhok A NIC + vmware-tools in the template results in the same problem: a NIC but not connected. The only way to get this to work is if the template has no NIC at all.

nmadhok commented 9 years ago

@syphernl I tried with a NIC already present in the template and I didn't have any problem. This seemed to be the case with @indispeq as well since he mentioned he tried with a NIC already present in his template. As long as the already existing NIC in the template is marked as "connect on power on", it will be connected and as long as you can see the IP in the vcenter client, salt-cloud should be able to ssh and install salt. There may be something else going on with the way you have the template configured. Can you share the template with me so I can see what's wrong?

You may also need the /etc/dhcp3 directory to be present or create a symlink to it. Although I don't have it in my template and it still works for me but if it doesn't work for you, I would suggest creating it. Refer to knowledge base article https://communities.vmware.com/message/2397050

syphernl commented 9 years ago

@nmadhok Since the template is a several gigs in size I cannot easily share this I'm afraid.

Perhaps there is a difference in VMware or vSphere versions/revisions you and @indispeq are using in comparison to our environment.

nmadhok commented 9 years ago

@syphernl Would it be possible for you to drop the file for me at https://filedrop.clemson.edu/dropbox/dropoff.php in case storage is an issue?

Also, can you run the get_vcenter_version() function from http://docs.saltstack.com/en/latest/ref/clouds/all/salt.cloud.clouds.vmware.html#salt.cloud.clouds.vmware.get_vcenter_version and paste the output?

root@nitin-develop:~$ salt-cloud -f get_vcenter_version vmware-vcenter03
vmware-vcenter03:
    ----------
    vmware:
        VMware vCenter Server 5.5.0 build-2001466
indispeq commented 9 years ago

@nmadhok - Success! I got it working with open-vm-tools on ubuntu 14.04 , it actually needed more on the vmware templating side. When creating the template, you need to follow the instructions here: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2075048

I updated my vcenter to 5.5 update 2e. Then in my template I installed instead of the open-vm-tools the and open-vm-tools-deploypkg . Still had one NIC in the template. I also used update manager to get my hosts to full compliance.

[INFO ] salt-cloud starting indispeq-vmware:

vmware:
    VMware vCenter Server 5.5.0 build-2646482

Do you need output or some other use cases, like no NIC, etc?

nmadhok commented 9 years ago

@indispeq That's great! Can you share the output? It would be great if you can also test this with some other cases such as No NIC etc.

@syphernl Can you try again after following the suggestions made by @indispeq

syphernl commented 9 years ago

We're now on VMware vCenter Server 5.5.0.20500 Build 2646489 which it a few revisions newer (which is also labeled 2e).

The template has a NIC (this only makes updating the template easier, but if Salt can add a NIC if none exists works for me as well). The NIC was added but it is not set to "Connect at power on".

During boot I see the Vmware tools status change from "Not running (3rd-party/Independent)" to Running Scripts. A little while later the machine (suddenly) reboots, most likely after setting the hostname and doing the network configuration.

Once the machine is back up, the IP Address field in the vSphere client does contain the IP. A short while after that salt-cloud detects it as well and does its magic (which fails since it cannot authenticate but that is a separate issue).

So, it order to get this to provision properly you'll either need:

Whichever option used depends on the preferences of the admin and/or requirements of the environment.

nmadhok commented 9 years ago

@syphernl The template may or may not have a NIC. If it doesn't have a NIC, it will be created and if it already has the NIC, it will be modified.

I have made some changes so that salt-cloud detects the IP as soon as it's available on the vSphere client instead of waiting for additional 10-15 seconds which has now been merged. If you update to the latest vmware salt-cloud driver, you should experience a faster deployment and quicker IP retrieval.

There's something in the way open-vm-tools works which requires open-vm-tools-deploypkg to be present in the template/VM. I'm glad it worked out for both you and @indispeq!

indispeq commented 9 years ago

@nmadhok - I haven't updated to the latest salt-cloud driver yet but I would like to give the debug output to you . It's a lot to sanitize and rather lengthy, so would you have a place to upload the files to? Thanks! It works without a NIC too. What happens if my template does not have an authorized_keys in the user's folder, will salt generate one for that user, or is a user/key(or password) combo necessary for provisioning to run?

nmadhok commented 9 years ago

@indispeq Would it be possible for you to upload it as a public gist on gist.github.com and share the link on this issue?

If sharing it with the public is a concern, you can also email me the output in the form of a text file on my email address or drop the file off for me at https://filedrop.clemson.edu/dropbox/dropoff.php

You can either generate a keypair yourself and put the public part of the key in the .ssh/authorized_keys file on your template/VM. Then specify the private key so that salt-cloud can ssh and install salt and run a state you specify or do a high state. Or you could specify a password for the initial login and after salt is installed, you can let it run a state to change the system password for you.

If you do not specify, salt will not be able to ssh to the machine and install salt, upload files you specify in your file_map, or run highstate/any states you specify in your profile/map.

indispeq commented 9 years ago

Hi @nmadhok , sorry for the delay, here they are, hope they help:

nmadhok commented 9 years ago

@indispeq Awesome! Thanks for the debug output! If this issue has been resolved, feel free to close the issue!

indispeq commented 9 years ago

Thank you, closing it then as fixed!

msheiny commented 9 years ago

I'm still running into this issue, my caveat is that I'm running an older version of vmware, VMware vCenter Server 5.1.0 build-1123961. Salt-cloud version 2015.5.5. My particular symptom is that after the VM gets cloned, the NIC card starts up as not connected. I have to manually open the console of the VM and toggle the NIC card online and then the salt-cloud script resumes once it establishes contact with the guest. The rest of the provisioning process goes down fine after that. Should upgrading to 5.5 of ESX resolve this?

Just to confirm, here is what I've tried on the machine I'm cloning from (a Ubuntu 14.04 machine):

My relevant profile config:

    network:
      Network adapter 1:
        name: VM Network
        switch_type: standard
        adapter_type: vmxnet3

EDIT -- Hope this helps someone else that is running into this issue. So apparently, when the machine is first booted the NIC is disconnected and Ubuntu was timing out for 60 seconds until it could establish networking during boot. Being the impatient person that I am, I was not waiting for it to get past this stage and kept shutting down and trying different things. Once it gets past that timeout, the vmware tools fireup on the guest, the configuration changes are made and the guest is rebooted.... afterwards the NIC is up. Thanks for making this post guys.

discotroy commented 8 years ago

For what it's worth, I was having this problem recently. Latest salt as of the end of March, and Ubuntu 15.10.

I found that by leaving the NIC present in the template but deleting the configuration stanza from /etc/network/interfaces I was able to get the VM to boot up much faster as it wasn't waiting for 5 minutes for the network to come up.

Once it booted, salt-cloud picked up as normal. Of course, getting bootstrap-salt to work is an entirely other matter, which I'm still working through.