hilbert / hilbert-cli

Backend management tools: CLI
Apache License 2.0
6 stars 2 forks source link

Various errors stopping a station #80

Open elondaits opened 6 years ago

elondaits commented 6 years ago

I stopped bigfoot60 and there were some errors near the end of the procedure:

Failed to power off system via logind: Interactive authentication required.

Failed to start poweroff.target: Interactive authentication required.
See system logs and 'systemctl status poweroff.target' for details.

Failed to open /dev/initctl: Permission denied
Failed to talk to init daemon.

Connection to 172.16.21.54 closed by remote host.

WARNING  [hilbert_cli_config.py:278]: Error exit code 255, while executing 'ssh -q -F /root/SSH/config 172.16.21.54 hilbert-station -v shutdown now'!

ERROR  [hilbert_cli_config.py:1435]: Could not run remote ssh command: 'ssh -q -F /root/SSH/config 172.16.21.54 hilbert-station -v shutdown now'! Return code: 255

WARNING  [hilbert_cli_config.py:2266]: Could not schedule immediate shutdown on the station '172.16.21.54'

Finished stopping station bigfoot60

Full log: station fails to stop.txt

malex984 commented 6 years ago
  1. Interactive authentication required. is now fixed with recent hilbert-station script
  2. tailing error message:
    Connection to 172.16.21.54 closed by remote host.
    WARNING  [hilbert_cli_config.py:278]: Error exit code 255, while executing 'ssh -q -F /root/SSH/config 172.16.21.54 hilbert-station -v shutdown now'!
    ERROR  [hilbert_cli_config.py:1435]: Could not run remote ssh command: 'ssh -q -F /root/SSH/config 172.16.21.54 hilbert-station -v shutdown now'! Return code: 255
    WARNING  [hilbert_cli_config.py:2266]: Could not schedule immediate shutdown on the station '172.16.21.54'

    is a bit misleading since it is exactly the expected correct behavior: shutdown now is supposed to immediately cut the current ssh connection to the station, and hilbert just detects that ssh client was terminated abruptly. I am going to add a slightly scheduled shutdown instead of now so that remote execution would be able to finalize correctly.

malex984 commented 6 years ago

it seems that the smallest delay can be 1 minute (via +1) which is also the default delay for shutdown (If no time argument is specified, "+1" is implied.)... @porst17 @elondaits is it Ok for hilbert stop to schedule remote station shutdown in a minute + correctly finalize the execution and to rely on station's shutdown to actually schedule and perform the system shutdown?

Note that with a cut ssh connection we have a guaranty that shutdown has been actually started (we can detect the connection cutting)...

elondaits commented 6 years ago

A 1 minute delay for a "cosmetic" problem is not a good idea. Also, making the process more complex makes it more likely to fail, and if it fails after the delay you still get no notification. I'd just remove the three errors (WARNING/ERROR/WARNING) it now produces because as you first said, it's not actually an error... so it shouldn't be reported as such.

porst17 commented 6 years ago

You can schedule the delayed shutdown via nohup sh -c 'sleep 2s; shutdown -P now' & (if 2s is long enough). It it not as robust as shutdown -P +1, but the current shutdown -P now just cuts the ssh connection and leaves you behind without any information if the shutdown was scheduled correctly. So I would argue that the nohup method is on the same level of reliability as shutdown -P now.

If you don't want to implement it this way, I also think it is OK to just silence the error messages in case of a shutdown -P now.