input-output-hk / nixops-packet

NixOps Packet.net Plugin
GNU Lesser General Public License v3.0
5 stars 5 forks source link

instances with IPs and old fingerprint already in the known_hosts file will experience wait_for_ssh hang #25

Closed johnalotoski closed 3 years ago

johnalotoski commented 3 years ago

Since the public host key is not implemented yet (example: https://github.com/input-output-hk/nixops-packet/blob/master/nixops_packet/backends/device.py#L433), an operation which tries to overwrite an existing known host, such as a reinstall or deploying a new instance which happens to re-use an already known_host IP, will result in known_host ssh messages/errors that are suppressed since nixops2 now uses ssh connection attempts for status probes which, if not stdout/stderr suppressed, inherently generate a large amount of noise (refs: https://github.com/NixOS/nixops/commit/0ed698c3c9f38cd3b50a8a59b87dd7de840060e9, https://github.com/NixOS/nixops/commit/362d049bdb087118b23472dc44ab23bf48a59bdb).

The end-user experience in this case is to just observe wait_for_ssh not returning because of a known_hosts conflict:

machineDemo> waiting for SSH............................................

For an instance experiencing this deployment issue, a workaround until fixed is to manually remove the instances' IP from the known_hosts file, and the deploy will continue successfully.

johnalotoski commented 3 years ago

Support for updating known_hosts using instance public_ipv4 to resolve this bug is added in https://github.com/input-output-hk/nixops-packet/commit/9b9b0aca9bfa310d50d3ca22f45b80d4b129f407. Existing machines can get their state updated using the nixops packet update-provision -d $DEPLOYMENT $MACHINE cli command.