pyinfra-dev / pyinfra

pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
https://pyinfra.com
MIT License
3.87k stars 378 forks source link

server.reboot: Never detect remote system is rebooted #1110

Closed matthijskooijman closed 1 month ago

matthijskooijman commented 4 months ago

Describe the bug

When using server.reboot, pyinfra reboots the target but then waits for it to come back until the timeout, even if the system comes up earlier.

To Reproduce

from pyinfra.operations server

server.reboot(name="Reboot")
pyinfra -vvv --debug 192.168.7.2 --ssh-user root reboot.py

Result:

    Traceback (most recent call last):
  File "/home/matthijs/.local/pipx/venvs/pyinfra/lib/python3.11/site-packages/pyinfra/api/operations.py", line 94, in _run_host_op
    status = command.execute(state, host, connector_arguments)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matthijs/.local/pipx/venvs/pyinfra/lib/python3.11/site-packages/pyinfra/api/command.py", line 224, in execute
    return self.function(state, host, *self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matthijs/.local/pipx/venvs/pyinfra/lib/python3.11/site-packages/pyinfra/operations/server.py", line 101, in wait_and_reconnect
    raise Exception(
Exception: Server did not reboot in time (reboot_timeout=300s)

    [192.168.7.2] Unexpected error in Python callback: Exception('Server did not reboot in time (reboot_timeout=300s)',)

Expected behavior

The server should reboot, when it is rebooted pyinfra should continue (or exit with success in this example).

Analysis

I added some debug output, and it turns out that Host.connect is called repeatedly to try making a new connection, but host.connected is true, so no actual connection attempts are made: https://github.com/pyinfra-dev/pyinfra/blob/80aca6e3ea9e2c1e423505abf1f5ef9c2c4affdc/pyinfra/api/host.py#L365

Looking at the code, there is no way to make host.connected False again (except for creating a new Host object). So I wonder:

(and an unrelated observation: It seems the timeout is not properly observed, since currently the timeout is divided by the interval to get the number of retries, but this assumes connection attempts take zero time, which is not true, especially when a system is rebooting, they might take longer).

Meta

pyinfra --support

    If you are having issues with pyinfra or wish to make feature requests, please
    check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
    When adding an issue, be sure to include the following:

    System: Linux
      Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.38
      Release: 6.5.0-28-generic
      Machine: x86_64
    pyinfra: v3.0b0
    Executable: /home/matthijs/.local/bin/pyinfra
    Python: 3.11.6 (CPython, GCC 13.2.0)

Installed via pipx.

lemmi commented 4 months ago

I'm guessing the error was introduced during

https://github.com/pyinfra-dev/pyinfra/commit/331865ae62e25aaf5faf3475b481160962e60a59?diff=unified&w=0#diff-706bc934a0c72456c2c88f6435770ac3fe3e13f1232f7b5464e7fcc44a9fd664L84-R88

The changes remove the host.connection attribute in favor of some other attribute including host.connected. Yet, server.reboot still makes use of host.connection: https://github.com/pyinfra-dev/pyinfra/blob/a7be6892254e32271aca92bca47fbf6edd18e743/pyinfra/operations/server.py#L89-L95

A quick fix is to replace host.connection with host.connected in server.reboot.

Fizzadar commented 4 months ago

Great catch, agree reboot should disconnect properly here and we should fix the timeout issue too!