stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.
35 stars 15 forks source link

OOD shell prompts to accept hostkeys #193

Open sjpb opened 1 year ago

sjpb commented 1 year ago

Ticket: https://stackhpc.atlassian.net/browse/DEV-976

Using the browser OOD shell prompts to accept hostkeys (presumably its the login node sshing into itself, but I didn't check). This is clunky but as the resulting acceptance in known_hosts is sorted on (NFS) /home, reimaging the login node then means that it rejects itself with no way for the user to accept.

I think the original UM6P code had a hook in to fix this but I didn't understand the problem at the time.

sjpb commented 1 year ago

Example of failure after recreating OOD/login node:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The ECDSA host key for dev-login-0 has changed,
and the key for the corresponding IP address 192.168.4.235
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:g/UWZ9ucYzai83BmbGj1HfEbF4Zy+vnqxhsUiJ5UC1w.
Please contact your system administrator.
Add correct host key in /home/steveb/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/steveb/.ssh/known_hosts:1
ECDSA host key for dev-login-0 has changed and you have requested strict checking.
Host key verification failed.
Your connection to the remote server has been terminated.
sjpb commented 1 year ago

This commit works, but the problem is you need to know the login IP address, as seen from the OOD node: https://gitlab.com/nesi1/flexihpc-slurm/-/commit/0655b84535420b154cd33171e46cb3e579dd6f34.

Might need dig to make it general e.g. https://stackoverflow.com/a/39083724/916373

sjpb commented 1 year ago

If just specifying the hostname, ssh will automatically add the IP to known_hosts. This fails if the IP remains the same but the hostkey changes, i.e. on rebuild

sjpb commented 1 year ago

Maybe we SHOULD fix the internal (login) IP? We do I think for grafana root url anyway, as control needs to know login IP.

sjpb commented 1 year ago

So full solution:

NB: the partial fix at the moment in NeSI as above is OK, it just needs site.yml rerunning which it does anyway.