Ability to deploy same box to multiple ESXi

rizwansarwar commented 6 years ago

First of all thank you for great plugin, it makes the ESXi provisioning so much easier.

I have been using it to provision on multiple ESXi servers and I realised that may be there is a nice feature/improvement that can be done. So here is the use case,

Let's suppose, I want to deploy a VM to 10 ESXi servers (may be docker host, or replicas of another service). The base VM is the same for all the destination ESXi servers, it has pretty much same configuration (CPU/Mem/Disk etc). The way the plugin works right now is that it will create a vApp for each ESXi separately and deploy it to ESXi servers. This works, however it fails most of the time when the number of destination ESXi servers is a lot (I have had high failure rate with 8).

By disabling parallel provision on Vagrant, it makes the same process even more slower, however this has good success rate.

My ask is, if the plugin could detect (with a flag/automatically), that as long as Base Box is same for a group of ESXi servers, then the vApp should only be build once. This will great, because the .vmx is is already customizable. I think this will speed up the process of deploying to more than 1 ESXi servers much faster. Hope it makes sense and hope you can incorporate it as an improvement/feature.

Thank you again for the hard work.

josenk commented 6 years ago

Personally, I don't see Vagrant as a deployment tool. There are much better tools out there to do that!

However, I can try to accommodate this issue. I have a feeling the tmp vmx file is getting deleted before all your parallel vagrant commands finish. If you enable debug mode, the tmp vmx file is not deleted. Can you please try adding esxi.debug = 'true' to the esxi section of the Vagrant file to see if that resolves the issue...

rizwansarwar commented 6 years ago

@josenk Thanks for quick reply. I tried your suggestion, however that not make a difference. The tasks fail at random stages. Here is my Vagrantfile, may be you can spot something stupid that I am doing, but basically it shows what I am trying to achieve.

`Vagrant.configure("2") do |config|

config.vm.provision "shell", path: "build.sh" config.vm.synced_folder ".", "/vagrant", disabled: true

esxi_ip = ['192.168.1.10', '192.168.1.11', '192.168.1.12', '192.168.1.13', '192.168.1.14', '192.168.1.15', '192.168.1.16', '192.168.1.17'] esxi_ip.each do |ip| config.vm.define "node-#{ip}" do |node| node.vm.box = 'geerlingguy/ubuntu1604'

  config.vm.provider :vmware_esxi do |esxi|
    esxi.esxi_hostname = ip
    esxi.esxi_username = "root"
    esxi.esxi_password = "mypassword"
    esxi.vm_disk_store = "DS_1"
    esxi.virtual_network = ["VM Network"]
    esxi.vmname = "VM-#{ip}"

    esxi.memsize = "4096"
    esxi.numvcpus = "6"
    esxi.debug = "true"
  end
end

end end`

The above, opens 8 sessions, and starts deploying to ESXi's. Some reach 60%, some 80%, most of them get stuck at 80 and eventually fail. I checked after enabling debug, the VMX file stayed on filesystem. However the result has stayed same as before.

josenk commented 6 years ago

Can you try another box... Recently I tried geerlingguy/ubuntu1604 and found it randomly timing out. In vSphere I see the ip address come and go randomly. Can you try 'generic/ubuntu1604' or some other to see if you get better results.

Also, make sure you are running vagrant-vmware-esxi plugin V1.3.2 or better... I fixed a timeout issue when the ovftool takes a long time to run.

josenk commented 6 years ago

abandoned???

rizwansarwar commented 6 years ago

Sorry was on extended holidays, I did try different boxes with same results.

josenk commented 6 years ago

If you could please test this beta, it would be appreciated. Just uninstall the old plugin, and install this beta. Let me know how it works out...

$ wget https://dc002.jintegrate.co:/89ki37392937y4/vagrant-vmware-esxi-2.2.1beta2.gem $ vagrant plugin uninstall vagrant-vmware-esxi $ vagrant plugin install vagrant-vmware-esxi-2.2.1beta2.gem

rizwansarwar commented 6 years ago

Ok tested the plugin, seems to work bit better now, although I think I need to test a little more to check if the builds work across multiple clusters. So thanks

I have encountered a different problem however, not sure if it is plugin issue or vagrant problem.

I build all hosts/clusters using vagrant using your plugin. The plugin, vagrant and vagrant files are installed on a jump box using ansible. I seem to be able to do a vagrant up/destroy if I login to the jump box via SSH terminal. However if I try to run vagrant up using ansible from my remote lab, it fails each time. Here is the debug output where it fails.

INFO runner: Preparing hooks for middleware sequence... INFO runner: 1 hooks defined. INFO runner: Running action: machine_action_read_state #<Vagrant::Action::Builder:0x0000000003870070> INFO warden: Calling IN action: #<VagrantPlugins::ESXi::Action::SetESXiPassword:0x00000000039346f0> RUBY_PLATFORM: x86_64-linux INFO interface: info: --- ESXi host access : password in Vagrantfile INFO interface: info: ==> node-119: --- ESXi host access : password in Vagrantfile ^[[1m==> node-119: --- ESXi host access : password in Vagrantfile^[[0m ERROR warden: Error occurred: There was an error talking to ESXi. Unable to connect to ESXi host! INFO warden: Beginning recovery process... INFO warden: Recovery complete.

It fails at ESXi host connectivity test, even though same vagrant file works just fine if I login to jumpbox via SSH, it won't work if I try using ansible to trigger vagrant up.

I also checked if it is actually trying to connect by doing a TCPDUMP, and it shows that it does try to connect to ESXi server, however it just bombs out with the error above.

I was wondering if you had any idea if this is something to do with Ruby Net:SSH? Does it need some special conditions to run? Any help is most welcome. Thanks

josenk commented 6 years ago

What method are you using to store the ESXi password? Plaintext, file: or env: ???

I'm wondering if it's a similar problem I had with Bamboo??? I had to set the HOME and VAGRANT_HOME environment variables.

Thanks,

Jonathan

-----Original message----- From: Rizwan Sarwar notifications@github.com Sent: Tuesday 24th April 2018 12:39 To: josenk/vagrant-vmware-esxi vagrant-vmware-esxi@noreply.github.com Cc: Jonathan Senkerik josenk@jintegrate.co; State change state_change@noreply.github.com Subject: Re: [josenk/vagrant-vmware-esxi] Ability to deploy same box to multiple ESXi (#10)

Ok tested the plugin, seems to work bit better now, although I think I need to test a little more to check if the builds work across multiple clusters. So thanks

I have encountered a different problem however, not sure if it is plugin issue or vagrant problem.

I build all hosts/clusters using vagrant using your plugin. The plugin, vagrant and vagrant files are installed on a jump box using ansible. I seem to be able to do a vagrant up/destroy if I login to the jump box via SSH terminal. However if I try to run vagrant up using ansible from my remote lab, it fails each time. Here is the debug output where it fails.

INFO runner: Preparing hooks for middleware sequence... INFO runner: 1 hooks defined. INFO runner: Running action: machine_action_read_state # INFO warden: Calling IN action: # RUBY_PLATFORM: x86_64-linux INFO interface: info: --- ESXi host access : password in Vagrantfile INFO interface: info: ==> node-119: --- ESXi host access : password in Vagrantfile ^[[1m==> node-119: --- ESXi host access : password in Vagrantfile^[[0m ERROR warden: Error occurred: There was an error talking to ESXi. Unable to connect to ESXi host! INFO warden: Beginning recovery process... INFO warden: Recovery complete.

It fails at ESXi host connectivity test, even though same vagrant file works just fine if I login to jumpbox via SSH, it won't work if I try using ansible to trigger vagrant up.

I also checked if it is actually trying to connect by doing a TCPDUMP, and it shows that it does try to connect to ESXi server, however it just bombs out with the error above.

I was wondering if you had any idea if this is something to do with Ruby Net:SSH? Does it need some special conditions to run? Any help is most welcome. Thanks

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

rizwansarwar commented 6 years ago

I figured it out, ansible uses ssh pipelining and uses a ssh options "ControlMaster" and "ControlMaster". When vagrant is triggered from within ansible ssh session, the ssh Libraries of the plugin inherit session variables, causing it bomb out each time.

The solution is to disable ControlMaster and ControlPath options at ansible level so they are not set. After that vagrant up works just fine via ansible.

Thanks for your hard work

rizwansarwar commented 6 years ago

Oh btw, I have been able to deploy multiple ESXi's now, the limitation seems to be the number of target ESXi's and the speed of disk and network. For heavier images, it takes ages and for larger set of ESXi's vagrant just times out, but I think that has nothing to do with the plugin. Works fine for upto 5 ESXi's at a time.

josenk / vagrant-vmware-esxi

Ability to deploy same box to multiple ESXi #10