Open sjpb opened 1 year ago
Example of failure after recreating OOD/login node:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: POSSIBLE DNS SPOOFING DETECTED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The ECDSA host key for dev-login-0 has changed,
and the key for the corresponding IP address 192.168.4.235
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:g/UWZ9ucYzai83BmbGj1HfEbF4Zy+vnqxhsUiJ5UC1w.
Please contact your system administrator.
Add correct host key in /home/steveb/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/steveb/.ssh/known_hosts:1
ECDSA host key for dev-login-0 has changed and you have requested strict checking.
Host key verification failed.
Your connection to the remote server has been terminated.
This commit works, but the problem is you need to know the login IP address, as seen from the OOD node: https://gitlab.com/nesi1/flexihpc-slurm/-/commit/0655b84535420b154cd33171e46cb3e579dd6f34.
Might need dig
to make it general e.g. https://stackoverflow.com/a/39083724/916373
If just specifying the hostname, ssh will automatically add the IP to known_hosts. This fails if the IP remains the same but the hostkey changes, i.e. on rebuild
Maybe we SHOULD fix the internal (login) IP? We do I think for grafana root url anyway, as control needs to know login IP.
So full solution:
/home/root/<hostname>
<login-hostname>,<login-ip>
in login:/etc/ssh/ssh_known_hosts`NB: the partial fix at the moment in NeSI as above is OK, it just needs site.yml rerunning which it does anyway.
Ticket: https://stackhpc.atlassian.net/browse/DEV-976
Using the browser OOD shell prompts to accept hostkeys (presumably its the login node sshing into itself, but I didn't check). This is clunky but as the resulting acceptance in known_hosts is sorted on (NFS) /home, reimaging the login node then means that it rejects itself with no way for the user to accept.
I think the original UM6P code had a hook in to fix this but I didn't understand the problem at the time.