Open kstenerud opened 3 years ago
If you wish to turn the system on which you're running ./algo
into an AlgoVPN, enter localhost
at this prompt:
Enter the IP address of your server: (or use localhost for local installation):
I've successfully installed algo on a remote (over passwordless ssh) ubuntu before.
I can confirm issue being described is both reproducible and a departure from the expected behavior.
@tamsky Since @kstenerud mentioned setting up a new VPS first it sounded like maybe it wasn't his intention to install remotely over SSH.
I'm still able to install to a remote server over SSH. Could something have changed on your end?
I tried a few times just now using Lightsail, their OS-only images of Ubuntu 18 and Ubuntu 20.
Both get stuck installing the remote server over SSH at the same point as the original issue description above.
After switching to a localhost-install on either of those Lightsail OS versions, I can assume that ansible is pausing within the following task:
TASK [Wait 600 seconds for target connection to become reachable/usable]
because that is the task output that immediately follows the IP_subject_alt_name
debug output.
So, knowing that it's waiting 600 seconds, I left ./algo
alone at the stalled step for more than 10 minutes, after which it spits out the following error:
TASK [Wait 600 seconds for target connection to become reachable/usable] *******************************************************************************
failed: [localhost -> <elided>] (item=<elided>) => {"ansible_loop_var": "item", "changed": false, "elapsed": 807, "item": "<elided>", "msg": "timed out waiting for ping module test success: Failed to connect to the host via ssh: Warning: Permanently added '<elided>' (ECDSA) to the list of known hosts.\r\nubuntu@<elided>: Permission denied (publickey)."}
\
I can assert that the subject IPv4 is reachable via passwordless (via ssh-agent) ssh ubuntu@<elided>
and the host key was manually accepted before invoking ./algo
.
Are there manual debug steps for the ping
module?
My previous test was with Vultr, but Lightsail works for me as well. Here's how I'm testing:
root
on Vultr or ubuntu
on Lightsail) and upgrade all packagesgit clone
of Algo on a local Ubuntu Server 20.04 system to configure the VPS via SSHCan you still SSH into the system after Algo has failed?
Can you still SSH into the system after Algo has failed?
Yes.
This is looking more and more like the ssh client is not using my ssh-agent to perform authentication.
Permission denied (publickey)
relevant snippet from ./algo -vvv
:
<<ipv4_elided>> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null
-o ConnectTimeout=6 -o ConnectionAttempts=30 -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o Port=22
-o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=60 -o ControlPath=/Users/<user_elided>/.ansible/cp/317d98769d
<ipv4_elided> '/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''
<<ipv4_elided>> (255, b'', b"Warning: Permanently added '<ipv4_elided>' (ECDSA) to the list of known hosts.\r\nubuntu@<ipv4_elided>: Permission denied (publickey).\r\n")
<<ipv4_elided>> ssh_retry: attempt: 4, ssh return code is 255. cmd ([b'ssh', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'UserKnownHostsFile=/dev/null', b'-o', b'ConnectTimeout=6', b'-o', b'ConnectionAttempts=30', b'-o', b'IdentitiesOnly=yes', b'-o', b'StrictHostKeyChecking=no', b'-o', b'Port=22', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="ubuntu"', b'-o', b'ConnectTimeout=60', b'-o', b'ControlPath=/Users/<user_elided>/.ansible/cp/317d98769d', b'<ipv4_elided>', b"/bin/sh -c 'echo ~ubuntu && sleep 0'"]...), pausing for 7 seconds
The following diff fixes the problematic behavior by removing -o IdentitiesOnly=yes
:
diff -r 04aedbe6bfe0 ansible.cfg
--- a/ansible.cfg Fri Dec 11 12:57:27 2020 +0300
+++ b/ansible.cfg Mon Dec 21 15:18:21 2020 -0800
@@ -12,6 +12,6 @@
record_host_keys = False
[ssh_connection]
-ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o ConnectTimeout=6 -o ConnectionAttempts=30 -o IdentitiesOnly=yes
+ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o ConnectTimeout=6 -o ConnectionAttempts=30
scp_if_ssh = True
retries = 30
Very interesting. Quoting the man page:
IdentitiesOnly Specifies that ssh(1) should only use the configured authentication identity and certificate files (either the default files, or those explicitly configured in the ssh_config files or passed on the ssh(1) command-line), even if ssh-agent(1) or a PKCS11Provider or SecurityKeyProvider offers more identities. The argument to this keyword must be yes or no (the default). This option is intended for situations where ssh-agent offers many different identities.
In my testing I'm using default identity files (in my case ~/.ssh/id_ed25519
with Vultr and ~/.ssh/id_rsa
with Lightsail) and these work with ssh-agent
.
So are you using a non-default identity file?
I wonder if we can safely remove this option from ansible.cfg
Edited to add: IdentitiesOnly was added here, probably for a good reason.
Please keep in mind that the use case here is for option 12:
12. Install to existing Ubuntu 18.04 or 20.04 server (for more advanced users)
So are you using a non-default identity file?
This definitely depends on what your expectations are surrounding the word "default".
I'm definitely using the default name for the downloaded key/certfile after they have been generated & downloaded within the Lightsail console, namely:
~/.ssh/LightsailDefaultKey-us-east-1.pem
, and then added to ssh-agent via
ssh-add ~/.ssh/LightsailDefaultKey-us-east-1.pem
.
I wonder if we can safely remove this option from ansible.cfg
I wonder if we can add support for dynamically removing IdentitiesOnly
from the ssh_config
value when Option 12 is in use, and:
ssh-agent
env var is detectedssh_key pathmname
prompt can be issuedIdentitiesOnly was added here, probably for a good reason.
I don't see a good reason, or a "probably", anywhere in that commit, or any of the issues connected to the commit (#152 #151 #112).
This definitely depends on what your expectations are surrounding the word "default".
My expectations are irrelevant, we're talking about OpenSSH. From the man page for ssh
on macOS:
-i identity_file Selects a file from which the identity (private key) for public key authentication is read. The default is ~/.ssh/id_dsa, ~/.ssh/id_ecdsa, ~/.ssh/id_ed25519 and ~/.ssh/id_rsa.
This explains why I wasn't able to reproduce your issue. I was using a default identity file so it didn't get excluded by IdentitiesOnly
.
So @jackivanov here is the issue, I think:
ssh_args
defined in ansible.cfg
includes the option IdentitiesOnly=yes
.IdentitiesOnly=yes
causes SSH to ignore identity files in ssh-agent
other than the default files.ssh-agent
but uses a non-default name for their identity file, Algo will hang because that identity in ssh-agent
will be ignored.Is IdentitiesOnly=yes
still needed?
If you wish to turn the system on which you're running
./algo
into an AlgoVPN, enterlocalhost
at this prompt:Enter the IP address of your server: (or use localhost for local installation):
This works for me. I was trying to install it locally, instead of remote
Removing IdentitiesOnly=yes
solved the issue for me.
I had exactly the situation @davidemyers described with a non-default passwordless SSH identity file.
This might help someone using GCP compute instance.
Why this was happening to me was that I was trying to install on gcloud compute instance and I was using gcloud compute ssh
command and not simple ssh
command to login to the instance.
After I directly did ssh user@ip
I was able to install algo.
Describe the bug
When I ran the algo script, it asked me a few questions, and now it's hung.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The script completes
Additional context
Add any other context about the problem here.
Full log