Closed x-yuri closed 9 years ago
You're using a custom inventory file (provisioning/hosts.dev
), but Vagrant dynamically allocates the forwarded port for SSH connections.
Note that for simple use case, I'd recommend to rely on auto-generated inventory by simply removing ansible.inventory_path
option in your Vagrantfile
.
If you still encounter problems, could you please gist your Vagrantfile
and provisioning/hosts.dev
, and maybe a playbook example that can reproduce the issue.
I'll try to set up a testing VM. Meanwhile, could you elaborate on "but Vagrant dynamically allocates the forwarded port for SSH connections" part? Actually I don't really understand what's happening. As far as I can tell, vagrant
always sets up port forwarding like this: 2222 (host) => 22 (guest)
. At least for this VM. And ansible
seems to connect using port 22
. What strikes me, why does ansible
even happen to connect? Shouldn't it be using port 2222
?
UPD Well, it must be because ansible connects using host-only network. But I still don't understand how Vagrant "allocates the forwarded port for SSH connections."
Well, it must be because ansible connects using host-only network
Yes indeed, my comment about port forwarding only referred to this host-only connection, which is Vagrant's default ssh path. In your case, you've defined a extra private (or public) network (192.168.88.10), where using default port 22 is absolutely correct.
With a quick Internet search, I found that there is several SSH issues when running Mac OS X Mavericks (and maybe affecting Yosemite as well). I could not check them, as I only have an OS X 10.9 box, but please have a look at:
If you still encounter problems, could you please provide following information:
Hint: for faster debugging, you also can execute the ansible-playbook
generated by Vagrant directly from the shell:
ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false PYTHONUNBUFFERED=1 ansible-playbook --private-key=/home/yuri/.vagrant.d/insecure_private_key --user=vagrant --vault-password-file=provisioning/.vault_pass_dev --inventory-file=provisioning/hosts.dev -vvvv --limit='all' provisioning/playbook.yml
(You'll see these command when ansible.verbose
is activated)
@x-yuri I'm closing this until we receive more information from your part.
Sorry for the delay. So, with this setup it still fails occasionally:
$ cat ./Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu/trusty64"
config.vm.network "private_network", ip: "192.168.88.11"
config.vm.provision :ansible do |ansible|
ansible.playbook = "playbook.yml"
ansible.inventory_path = "hosts"
ansible.verbose = 'vvvv'
end
end
$ cat ./hosts
[default]
h1 ansible_ssh_host=192.168.88.11 ansible_ssh_user=vagrant
$ cat ./playbook.yml
---
- hosts: default
tasks:
- name: test task
shell: echo test
Hint: for faster debugging, you also can execute the ansible-playbook generated by Vagrant directly from the shell:
ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false PYTHONUNBUFFERED=1 ansible-playbook --private-key=/home/yuri/.vagrant.d/insecure_private_key --user=vagrant --vault-password-file=provisioning/.vault_pass_dev --inventory-file=provisioning/hosts.dev -vvvv --limit='all' provisioning/playbook.yml (You'll see these command when ansible.verbose is activated)
I thought the problem had to do with ansible
running too early. Was I wrong? Anyway, I tested it this way: vagrant destroy -f && vagrant up
.
You're using a custom inventory file (provisioning/hosts.dev), but Vagrant dynamically allocates the forwarded port for SSH connections.
Still, what exactly do you mean by dynamically allocating ports?
The issues at those links has nothing to do with mine. The first has ssh: connect to host 127.0.0.1 port 22: Connection refused
error, and the second fatal: [127.0.0.1] => to use the 'ssh' connection type with passwords, you must install the sshpass program
.
UPD I just confirmed that it must have to do with ansible
running too early. I ran vagrant destroy -f && vagrant up
until I get the error. Then I ran ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false PYTHONUNBUFFERED=1 ansible-playbook --private-key=/home/yuri/.vagrant.d/insecure_private_key --user=vagrant --limit='default' --inventory-file=hosts -vvvv playbook.yml
and it succeeded:
$ vagrant destroy -f && vagrant up
==> default: Forcing shutdown of VM...
==> default: Destroying VM and associated drives...
==> default: Running cleanup tasks for 'ansible' provisioner...
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/trusty64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'ubuntu/trusty64' is up to date...
==> default: Setting the name of the VM: 5_default_1417385755633_17573
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
default: Adapter 2: hostonly
==> default: Forwarding ports...
default: 22 => 2222 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Connection timeout. Retrying...
default: Warning: Remote connection disconnect. Retrying...
default: Warning: Remote connection disconnect. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Configuring and enabling network interfaces...
==> default: Mounting shared folders...
default: /vagrant => /home/yuri/_/5
==> default: Running provisioner: ansible...
ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false PYTHONUNBUFFERED=1 ansible-playbook --private-key=/home/yuri/.vagrant.d/insecure_private_key --user=vagrant --limit='default' --inventory-file=hosts -vvvv playbook.yml
PLAY [default] ****************************************************************
GATHERING FACTS ***************************************************************
<192.168.88.11> ESTABLISH CONNECTION FOR USER: vagrant
<192.168.88.11> REMOTE_MODULE setup
<192.168.88.11> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/yuri/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'IdentityFile="/home/yuri/.vagrant.d/insecure_private_key"', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '192.168.88.11', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1417385811.38-111995198978464 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1417385811.38-111995198978464 && echo $HOME/.ansible/tmp/ansible-tmp-1417385811.38-111995198978464'"]
fatal: [h1] => SSH encountered an unknown error. The output was:
OpenSSH_6.7p1, OpenSSL 1.0.1j 15 Oct 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/yuri/.ansible/cp/ansible-ssh-192.168.88.11-22-vagrant" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.88.11 [192.168.88.11] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: connect to address 192.168.88.11 port 22: Connection timed out
ssh: connect to host 192.168.88.11 port 22: Connection timed out
TASK: [test task] *************************************************************
FATAL: no hosts matched or all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @/home/yuri/playbook.retry
h1 : ok=0 changed=0 unreachable=1 failed=0
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
$ ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false PYTHONUNBUFFERED=1 ansible-playbook --private-key=/home/yuri/.vagrant.d/insecure_private_key --user=vagrant --limit='default' --inventory-file=hosts -vvvv playbook.yml
PLAY [default] ****************************************************************
GATHERING FACTS ***************************************************************
<192.168.88.11> ESTABLISH CONNECTION FOR USER: vagrant
<192.168.88.11> REMOTE_MODULE setup
<192.168.88.11> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/yuri/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'IdentityFile="/home/yuri/.vagrant.d/insecure_private_key"', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '192.168.88.11', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1417386104.06-113950359141030 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1417386104.06-113950359141030 && echo $HOME/.ansible/tmp/ansible-tmp-1417386104.06-113950359141030'"]
<192.168.88.11> PUT /tmp/tmpiwjrvt TO /home/vagrant/.ansible/tmp/ansible-tmp-1417386104.06-113950359141030/setup
<192.168.88.11> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/yuri/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'IdentityFile="/home/yuri/.vagrant.d/insecure_private_key"', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '192.168.88.11', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1417386104.06-113950359141030/setup; rm -rf /home/vagrant/.ansible/tmp/ansible-tmp-1417386104.06-113950359141030/ >/dev/null 2>&1'"]
ok: [h1]
TASK: [test task] *************************************************************
<192.168.88.11> ESTABLISH CONNECTION FOR USER: vagrant
<192.168.88.11> REMOTE_MODULE command echo test #USE_SHELL
<192.168.88.11> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/yuri/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'IdentityFile="/home/yuri/.vagrant.d/insecure_private_key"', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '192.168.88.11', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1417386115.24-57373581652877 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1417386115.24-57373581652877 && echo $HOME/.ansible/tmp/ansible-tmp-1417386115.24-57373581652877'"]
<192.168.88.11> PUT /tmp/tmpr5GMav TO /home/vagrant/.ansible/tmp/ansible-tmp-1417386115.24-57373581652877/command
<192.168.88.11> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/yuri/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'IdentityFile="/home/yuri/.vagrant.d/insecure_private_key"', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '192.168.88.11', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1417386115.24-57373581652877/command; rm -rf /home/vagrant/.ansible/tmp/ansible-tmp-1417386115.24-57373581652877/ >/dev/null 2>&1'"]
changed: [h1] => {"changed": true, "cmd": "echo test", "delta": "0:00:00.023101", "end": "2014-11-30 22:22:28.782800", "rc": 0, "start": "2014-11-30 22:22:28.759699", "stderr": "", "stdout": "test"}
PLAY RECAP ********************************************************************
h1 : ok=2 changed=1 unreachable=0 failed=0
UPD I'm running arch linux
, vagrant-1.6.5
and ansible-1.7.2
.
@x-yuri thanks for all the info. This is odd... What I see here, is that with your Arch Linux you're using one of the most recent versions of OpenSSH (version 6.7 was released on October 6, 2014). Ubuntu 14.04 is using 6.6p1
, Homebrew for Mac provides 6.6.p1_1
.
What VM provider are you using actually? KVM? Virtualbox?
I reopen this issue and flag it to hosts/arch
for now. I would recommend to search and ask with Arch community as well (https://wiki.archlinux.org/index.php/Vagrant, https://bbs.archlinux.org/search.php)
@x-yuri I renamed the issue. Does vagrant ssh
command work ?
I'm using virtualbox
, and vagrant ssh
works. It looks like ansible
starts too early, when ssh
connection cannot be established.
Please, can you execute VAGRANT_LOG=debug vagrant provision
and gist the full log output
and maybe also for vagrant ssh
(for comparison)
@x-yuri the last example that you gisted does not fully correspond to the original problem report:
That said, I agree with your analysis that at least the "private network (192.168.88.11)" connectivity is not ready when the ansible provisioner is started.
Next, could you please:
ansible.inventory_path
option (autogenerated inventory), and try: vagrant destroy -f && VAGRANT_LOG=debug vagrant up
. I expect that everything will run fine, as the basic localhost network is also used by other vagrant internals to ssh into the VM during the bootstrap.ansible.inventory_path = "hosts"
and again execute vagrant destroy -f && VAGRANT_LOG=debug vagrant up
to try to understand why or how the private network is slow to be enabled.Note: I updated the issue title again (for helping internet searches).
@x-yuri what is your exact versions for:
Well, if you expected me to gist vagrant provision
while it fails, than first, I believe I couldn't possibly do that. Because vagrant provision
is run after vagrant up
and I didn't see it failing. However I didn't tested it thoroughly. Additionally, if the reason is ansible
starting too early, we should inspect vagrant up
, not vagrant provision
.
it now uses the autogenarate ansible inventory (--inventory-file=/home/yuri/_/5/.vagrant/provisioners/ansible/inventory)
Sorry, I was experimenting with commenting corresponding line out and forgot to uncomment it back.
Re-configured back to ansible.inventory_path = "hosts" and again execute vagrant destroy -f && VAGRANT_LOG=debug vagrant up to try to understand why or how the private network is slow to be enabled.
To make things clear (I'm not sure you noticed the point), it's not just about reexecuting vagrant
. I've got to run it until it fails. I'm going to try this command this time: for i in {1..10}; do vagrant destroy -f && VAGRANT_LOG=debug vagrant up || break; done
.
As for the versions, virtualbox
is of version 4.3.18
. Then, arch linux
has no such thing as version, the closest thing I can probably provide is the last time I ran pacman -Syu
, it was on 2014-11-17 01:29
. And the kernel is of version 3.17.3
.
Also, I wanted to make it clear, that I didn't do any tweaks to the kernel, and I did minimal adjustments to the configuration of the software I'm using, just the things I needed. Also, I hope it's not only about arch linux
:) But I've probably got that to prove yet.
@x-yuri sorry, if was unclear, but no worries we'll keep digging into it bravely :hospital:. Also thank you for the software versions clarification, you're running only on bleeding edges :rocket:, and that could be a source of "yet unknown" integration troubles, that's all.
if the reason is ansible starting too early, we should inspect vagrant up, not vagrant provision
Correct, that was my mistake, I should have asked you to run VAGRANT_LOG=debug vagrant up
instead, but I mixed myself with copy-paste and so on. Sorry for that :crying_cat_face:.
Note that vagrant up
automatically kicks the provisioner(s) on the VM initial bootsrap, hence the value to use VAGRANT_LOG=debug vagrant up
(with a new VM to be created)
So, my questions are:
hosts
: does vagrant destroy -f; vagrant up
systematically fails (ansible timeout) with your current host setup?vagrant destroy -f; vagrant up
succeed to run the ansible provisioner without timeout error?Just chiming in that I am experiencing this as well. I also have a custom inventory file which contains the following: vagrant ansible_ssh_host=192.168.33.40
. I'm doing this so I can set variables specific to development in host_vars/vagrant
.
The problem is intermittent and running vagrant provision manually works fine after the VM finishes booting.
@tkellen Which OS is it?
@gildegoma This is output for failed and successful runs. It didn't occur to me that less
will save the output with escape sequences, so I had to remove them with this script afterwards. As for making it fail with auto-generated inventory, I couldn't reproduce it yet. Probably, I will leave my computer trying to make it happen for a night. I hope I didn't miss any of your requests.
Multiple hosts. I've seen it on OSX and Ubtunu.
@tkellen Thanks for your feedbacks, which obviously enlarge the scope of the problem (I'll update again the issue title and labels).
@x-yuri Awesome work to fetch the logs, thanks a lot! I've started to check and compare vagrant-fail
and vagrant-success
outputs. In both runs, the setup of 192.168.88.x
private network succeed and looks the same (VirtualBox commands and Ubuntu network reconfiguration), but maybe there is a random latency until /sbin/ifup eth1
is really effective... and I don't think that at the moment Vagrant checks/wait until additional network are operational, before going further. This could explain that sometimes the ansible provisioner is fired to early, but I need to further into this (later, not today)...
But for the next investigation steps I propose you two kinds of possible "workarounds":
ansible.raw_arguments = ['--timeout=300']
via Vagrantfile
or via a local ansible.cfg
file if you prefer)I also noticed that when ansible-playbook
fails, we see this message (also reported in https://github.com/ansible/ansible/issues/7759):
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/yuri/.ansible/cp/ansible-ssh-192.168.88.11-22-vagrant" does not exist
Usually, I don't think this should be a problem (SSH simply recreates the control socket file), but maybe it introduces some additional delay that possibly lead to the observed timeout error. So I think you also could try to:
ansible.raw_ssh_args = ['ControlPersist=no']
It would be of great help If you could try these two custom configurations :sos:
@tkellen To make it clear, was the error exactly "Connection timed out"?
I've seen it on OSX and Ubtunu.
My coworkers had similar complaints, and that's exactly the OSes they use. It's just hard to make them confirm it.
As for ansible.raw_arguments = ['--timeout=300']
, it succeeded 352
times. UPD Something might be off here. I need to recheck this.
The way it looks I'll soon be an expert at testing vagrant
:)
Just to have it stored somewhere, this command has proved to be easier to use: for (( i = 1; 1; i++ )); do vagrant destroy -f && VAGRANT_LOG=debug vagrant up || break; echo $i > /dev/tty; done &> log
The thing is this issue is hard to reproduce. I don't know how many times I should let it recreate VM to be sure that the issue is gone. Supposedly computer must be under heavy load for it to manifest itself. However by now I haven't yet manage to come up with a reliable way to make it happen, let alone fast one.
And I'm not so sure now, that the issue can't happen with auto-generated inventory.
So, do you have any suggestion for how to force it happen? One of the options would be to do it from inside the VM. Also, could you point me out where in the code vagrant
waits for network to be operational?
The thing is this issue is hard to reproduce.
@x-yuri You did more than pretty well, and I thank you again so much for your efforts. I don't think we need any more tests on your side :heart: (and sorry I asked too much things, it is a bit hard to be complete in asynchronous communication).
And I'm not so sure now, that the issue can't happen with auto-generated inventory.
- auto-generated inventory is 100% safe because it uses the "official"
ssh_info
returned by the VM provider. The same SSH connectivity is used by all vagrant core components (e.g. to configure your private network) during the VM bootstrap phase.- when using an extra (private or public) network address in a custom Ansible inventory, it is possible to encounter some random timeout errors (subject of this problem report). To avoid these errors, it is recommended to increase the SSH connection timeout of
ansible-playbook
command withansible.raw_arguments = ['--timeout=30']
(the default value is 10 seconds and this should be empirically adjusted to your environment performances).could you point me out where in the code vagrant waits for network to be operational
Here are some parts that may interest you:
I close this issue with following further possible enhancement ideas to be evaluated in the future:
ansible.timeout
option to the Ansible provisioner, with a higher default value (e.g. 30 seconds)@mitchellh what do you think? Are you open to accept pull requests implementing the above ideas?
Not sure if it's vagrant's fault:
This error happens occasionally (not every time). So, first
vagrant
says the machine is up and ready, but thenansible
says that it can't connect. Can this be fixed in any way? Could you suggest anything to further investigate the issue?