Project Receptor is a flexible multi-service relayer with remote execution and orchestration capabilities linking controllers with executors across a mesh of nodes.
Other
32
stars
21
forks
source link
Node emits traceback when failing to connect to peer #131
When a node starts, it can be told to connect to a peer. It's expected that a node might be unable to connect to a node, at least not right away. In this case, the node will simply sleep for a while, then re-try connecting.
When a node fails to connect to a peer, I expect it to either handle this silently, or with a terse message like:
Failed to connect to 127.0.0.1:8889. Will retry in 5 seconds.
In reality, a node will handle this error by emitting a traceback:
$ receptor --data-dir="$(mktemp --directory)" node --peer='127.0.0.1:8889'
ERROR 2020-02-05 14:15:13,240 sock sock.connect
Traceback (most recent call last):
File "/home/ichimonji10/code/receptor/receptor/connection/sock.py", line 40, in connect
r, w = await asyncio.open_connection(host, port, loop=loop, ssl=ssl)
File "/usr/lib/python3.8/asyncio/streams.py", line 52, in open_connection
transport, _ = await loop.create_connection(
File "/usr/lib/python3.8/asyncio/base_events.py", line 1021, in create_connection
raise exceptions[0]
File "/usr/lib/python3.8/asyncio/base_events.py", line 1006, in create_connection
sock = await self._connect_sock(
File "/usr/lib/python3.8/asyncio/base_events.py", line 920, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/lib/python3.8/asyncio/selector_events.py", line 494, in sock_connect
return await fut
File "/usr/lib/python3.8/asyncio/selector_events.py", line 526, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8889)
IMO, emitting tracebacks for expected errors that are being caught and handled is problematic behaviour:
Messages such as this are likely to generate alarm in end users. This might manifest in any number of negative ways:
Users might file spurious bugs against receptor.
Users might conclude that receptor is a poor quality product and abandon it, or become less satisfied with any product that makes use of it.
Messages such as this lower the signal to noise ratio, making it harder for QE to zero in on more important information, and possibly obscuring other tracebacks that genuinely signal problems.
Can receptor respond to this expected operating condition in a less-alarming and more terse manner?
When a node starts, it can be told to connect to a peer. It's expected that a node might be unable to connect to a node, at least not right away. In this case, the node will simply sleep for a while, then re-try connecting.
When a node fails to connect to a peer, I expect it to either handle this silently, or with a terse message like:
In reality, a node will handle this error by emitting a traceback:
IMO, emitting tracebacks for expected errors that are being caught and handled is problematic behaviour:
Messages such as this are likely to generate alarm in end users. This might manifest in any number of negative ways:
Messages such as this lower the signal to noise ratio, making it harder for QE to zero in on more important information, and possibly obscuring other tracebacks that genuinely signal problems.
Can receptor respond to this expected operating condition in a less-alarming and more terse manner?