The option -v (and -o) for yastatus lists the excerpts from the output logs at each the active machine. The problem is that sometimes the status changes right at the time of the listing, so the machine gets no more available. Then the errors like below occur (should be easy to handle).
..................................................ID2456 aiida-33160 at root@65.109.143.81:hetzner:data/tasks/20221231_044118_2456
INFO:backoff:Backing off create(...) for 0.8s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 0.5s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 1.9s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 2.5s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 4.8s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 6.3s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 4.5s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 4.2s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 1.9s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 3.7s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 1.1s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
ERROR:backoff:Giving up create(...) after 12 tries (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
Traceback (most recent call last):
File "/usr/local/bin/yastatus", line 8, in <module>
sys.exit(check_status())
File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 237, in check_status
asyncio.run(_check_status())
File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 148, in _check_status
machine = await RemoteMachine.create(
File "/usr/local/lib/python3.9/dist-packages/backoff/_async.py", line 151, in retry
ret = await target(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/yascheduler/remote_machine/remote_machine.py", line 192, in create
conn = await asyncssh.connection.connect(
File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 7834, in connect
return await asyncio.wait_for(
File "/usr/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 437, in _connect
_, session = await loop.create_connection(
File "/usr/lib/python3.9/asyncio/base_events.py", line 1056, in create_connection
raise exceptions[0]
File "/usr/lib/python3.9/asyncio/base_events.py", line 1041, in create_connection
sock = await self._connect_sock(
File "/usr/lib/python3.9/asyncio/base_events.py", line 955, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/lib/python3.9/asyncio/selector_events.py", line 502, in sock_connect
return await fut
File "/usr/lib/python3.9/asyncio/selector_events.py", line 537, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
OSError: [Errno 113] Connect call failed ('65.109.143.81', 22)
The option
-v
(and-o
) foryastatus
lists the excerpts from the output logs at each the active machine. The problem is that sometimes the status changes right at the time of the listing, so the machine gets no more available. Then the errors like below occur (should be easy to handle).