mediacloud / sous-chef

Configurable Data Analytics Pipeline
1 stars 0 forks source link

Prefect Worker Robustness #20

Open pgulley opened 1 month ago

pgulley commented 1 month ago

The prefect worker on Guerin sometimes crashes for kind of opaque reasons- I'd love to get more insight into why this happens and figure out how to prevent the error, automatically restart it, or otherwise troubleshoot.

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 189, in connect_tcp
addr_obj = ip_address(remote_host)
File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'api.prefect.cloud' does not appear to be an IPv4 or IPv6 address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions
yield
File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
stream: anyio.abc.ByteStream = await anyio.connect_tcp(
File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 192, in connect_tcp
gai_res = await getaddrinfo(
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

This error occurs silently maybe once a week, never during sous-chef execution

pgulley commented 1 month ago

I've jumped into the prefect-community slack to see if they have any insight.

It's possible that the solution is just to set it up as a systemd service with restart enabled, but I'd love to have more insight before just accepting that situation

pgulley commented 1 month ago

the systemd configuration process is documented here: https://discourse.prefect.io/t/how-to-run-a-prefect-2-worker-as-a-systemd-service-on-linux/1450