ilri / rmg-ansible-public

Ansible playbooks for ILRI research-computing infrastructure
GNU General Public License v3.0
8 stars 2 forks source link

SSHD reload during initial play causes Ansible to lose connection #37

Closed alanorth closed 8 years ago

alanorth commented 8 years ago

Running the DSpace role on a clean Ubuntu 16.04 host goes very well until the SSHD handler is notified and reloads, causing all other handlers after to fail.

RUNNING HANDLER [common : reload sshd] *****************************************
changed: [dspace]

RUNNING HANDLER [common : reload sysctl] ***************************************
No handlers could be found for logger "paramiko.transport"
fatal: [dspace]: UNREACHABLE! => {"changed": false, "msg": "Incompatible ssh server (no acceptable macs)", "unreachable": true}

RUNNING HANDLER [common : restart firewalld] ***********************************
No handlers could be found for logger "paramiko.transport"
fatal: [dspace]: UNREACHABLE! => {"changed": false, "msg": "Incompatible ssh server (no acceptable macs)", "unreachable": true}

RUNNING HANDLER [postgres : restart postgres] **********************************
No handlers could be found for logger "paramiko.transport"
fatal: [dspace]: UNREACHABLE! => {"changed": false, "msg": "Incompatible ssh server (no acceptable macs)", "unreachable": true}

RUNNING HANDLER [munin : restart munin-node] ***********************************
No handlers could be found for logger "paramiko.transport"
fatal: [dspace]: UNREACHABLE! => {"changed": false, "msg": "Incompatible ssh server (no acceptable macs)", "unreachable": true}

RUNNING HANDLER [dspace : reload nginx] ****************************************
No handlers could be found for logger "paramiko.transport"
fatal: [dspace]: UNREACHABLE! => {"changed": false, "msg": "Incompatible ssh server (no acceptable macs)", "unreachable": true}

RUNNING HANDLER [dspace : restart tomcat7] *************************************
No handlers could be found for logger "paramiko.transport"
fatal: [dspace]: UNREACHABLE! => {"changed": false, "msg": "Incompatible ssh server (no acceptable macs)", "unreachable": true}
        to retry, use: --limit @site.retry

PLAY RECAP *********************************************************************
dspace                     : ok=82   changed=61   unreachable=6    failed=0

I can manually SSH to the host, but any ansible actions say incompatible SSH server. I guess it's because of the bit about ECDSA/ED25519 host keys. Hmm, ansible 2.0+ made it really tricky to troubleshoot this, as I have no idea which line in my ~/.ssh/known_hosts this is.

alanorth commented 8 years ago

Seems to be related to MACs, and is from paramiko.

Temporarily using default MACs in /etc/ssh/sshd_config for now. I've never seen this issue before... hmm.

oguya commented 8 years ago

I've deployed these playbooks to several test & production Ubuntu 16.04 hosts using ansible 2.0.x, but I've never experienced this issue...hmmm

oguya commented 8 years ago

Is there some useful debug info. if you use the -v option on ansible??

alanorth commented 8 years ago

I'm on Ansible 2.1.0.0 and paramiko 2.0.1, maybe it's something there. No, ansible's verbose mode is not helpful (has it ever been?). On a hunch I checked to make sure my client SSH config and the server SSH config have matching MACs supported:

MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,umac-128-etm@openssh.com

Top is my client, bottom is our current SSHD template for Ubuntu 16.04.

alanorth commented 8 years ago

I'm going to try again on a new 16.04.1 image today, I'll let you know how it goes. My client OpenSSH is now 7.3(!).

alanorth commented 8 years ago

Works as expected now. Not sure what was up. Closing issue.

alanorth commented 8 years ago

I just hit this bug again on the same 16.04.1 instance that I mentioned working last week. The provisioning user was deployed and the roles had been run successfully, but today I decided to add an aorth user. When I try to run the ssh-keys task as that user I get the same error from above about no acceptable macs. It seems the error Paramiko prints is wrong, perhaps, because I can successfully ssh to the machine fine using a password.

To solve this problem I commented out all the server's custom MACs, Ciphers, and KexAlgorithms in /etc/ssh/sshd_config, restarted the sshd service, and ran the ssh-keys tag successfully. Once I had working password-less login with my keys, I reverted the MACs, Ciphers, and KexAlgorithms and restarted the server's sshd service. Why would the same exact task with the same exact SSHD config and paramiko version fail depending on if password login is being used?

Anyways, then I had a new problem:

dspace1604 | UNREACHABLE! => {
    "changed": false,
    "msg": "(u'127.0.0.1', <paramiko.rsakey.RSAKey object at 0x1069ca710>, <paramiko.rsakey.RSAKey object at 0x1066dcb50>)",
    "unreachable": true
}

This message is actually saying (from past experience) that the host key changed according to the records in ~/.ssh/known_hosts, which always happens when new host is using the same IP as a previous host, such as with local development VMs.

To fix this you have to create an ansible.cfg with the following contents:

[defaults]
host_key_checking = False

Then it works. But after that you can remove the ansible.cfg (or comment out the line) and it still connects fine.

Something very weird with Paramiko.

oguya commented 8 years ago

what if you try using a different connection type instead of paramiko? For example: --connection=ssh

alanorth commented 8 years ago

Good idea. I'll try to reproduce the MAC issue above and then switch to --connection=ssh.

alanorth commented 8 years ago

We need to add one (or both) of these MACs to our sshd_config: hmac-sha2-512 and hmac-sha2-256. These are the only MACs Paramiko supports that aren't using MD5 or SHA1. See paramiko/transport.py and the changelog for Paramiko 1.16.0 where they were added in November, 2015.

In my experience I only get the "incompatible macs" issue when I'm connecting with password authentication (after our sshd_config template has taken effect on the host, after which point you're supposed to have keys setup already). I think this is because key-based authentication ends up using the ChaCha20-Poly1035 or AES-GCM ciphers, which are Authenticated Encryption with Associated Data (AEAD) ciphers — they authenticate and encrypt, without needing an external MAC algorithm.

So basically, we should add these two MACs to our sshd_configs: hmac-sha2-512 and hmac-sha2-256.

oguya commented 8 years ago

So on openssh 6.7+ you'd have something like:

MACs  all-etm-macs-first,hmac-sha2-512,hmac-sha2-256
oguya commented 8 years ago

OK, never mind! Just saw your fix.

alanorth commented 8 years ago

Yah, that's it. Our Ubuntu 14.04 and Debian 7 configs already have these MACs. Not sure why we dropped them. Oops.