Closed kenorb closed 7 years ago
Ok, solved the mystery.
The timeout in SSH config was too short, because it was controlled by ServerAliveInterval/ServerAliveInterval in the ~/.ssh/config
file. So I had to increase it.
Why one AMI failed over another, because the failing AMI had already some /vagrant
folder, so it took longer for rsync to calculate the differences.
Here is the new settings in ~/.ssh/config
file on the client side:
Host *
ServerAliveInterval 60
ServerAliveCountMax 3
If the timeout is too short, it usually ends with Broken pipe.
Problem
The rsync fails when running:
vagrant up --no-provision --provider=aws
with the following errors:Output:
What I've tried
aws.user_data = 'sed -i\'.bak\' -e \'s/^\(Defaults\s\+requiretty\)/# \1/\' /etc/sudoers'
config.ssh.pty = true
which temporary seemed to work yesterday, but not today when I was testing again.rsync
command manually it seems to work correctly, so I suspect ControlMaster to be issue, but it seems I cannot disable it from the SSH config.Extra info
Added
VAGRANT_LOG=debug
to see more details andrsync
is exactly the same command between working and failing scenario (for two different AMIs). The rsync command is:Extra comments
The mitchellh/vagrant/issues/6780 is similar, but I don't have 'sudo: sorry, you must have a tty to run sudo' message, so it's related to ControlMaster.
In the attachment please find two Jenkins logs running in exactly the same way (the only difference is different AMI). This issue is repeatable. I've compared these two files (using DiffMerge) and they're almost exactly the same, apart the control master error that it cannot connect to new control master.
I don't think it's a problem with remote or sudoerrs file, as the Vagrant cannot even connect to the host using control master file. Although it can connect before (as per
ubuntu@ip-172-30-2-117:~$
prompt being received).This is quiet weird issue and I don't know how to debug this further.
Check the attached log files: consoleText-11.txt consoleText-12.txt
Related: GH-340, mitchellh/vagrant/issues/6702 (but this happens on Ubuntu Linux machine).
The workaround could be to comment out this line:
from
helper.rb
file. Although in my 1.8.1 I've got 3 lines to comment out. But then it fails with:auth.log
file from remoteHere is
auth.log
file from that EC2 instance:/etc/sudoers
file from remoteBut it's the same on both working and non-working instances.
SSH debug (-vvv)
Brief:
Based on above it sounds like SSH entered interactive session and has been successfully authenticated, but it got the timeout, which is weird, as it took only 62 seconds to show the error. Possibly increasing the timeout may help?