jooola / earhorn

Listen, monitor and archive your Icecast streams!
GNU General Public License v3.0
10 stars 4 forks source link

S3 Upload failure doesn't get re-queued #128

Closed paddatrapper closed 2 years ago

paddatrapper commented 2 years ago

It seems that this exception isn't handled properly

2022-10-07 08:19:14.900 | DEBUG    | earhorn.stream_archive_s3:ingest_segment:24 - uploading segment /tmp/earhorn-9z6xj4uy/segment.2022-10-07-08-09-14.mp3 to s3://uct-radio-archive           
Exception in thread Thread-4 (wait_for_segments):                                                                                                                                              
Traceback (most recent call last):       
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn                                                                                                  
    conn = connection.create_connection(       
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection                                                                                      
    raise err                                  
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection                                                                                      
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/botocore/httpsession.py", line 448, in send
    urllib_response = conn.urlopen(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen 
    retries = retries.increment(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 525, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/opt/venv/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen 
    httplib_response = self._make_request(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/venv/lib/python3.10/site-packages/botocore/awsrequest.py", line 94, in _send_request
    rval = super()._send_request(
  File "/usr/local/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/venv/lib/python3.10/site-packages/botocore/awsrequest.py", line 123, in _send_output
    self.send(msg)
  File "/opt/venv/lib/python3.10/site-packages/botocore/awsrequest.py", line 218, in send
    return super().send(str)
  File "/usr/local/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7f381c668460>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/venv/lib/python3.10/site-packages/earhorn/stream_archive.py", line 147, in wait_for_segments
    self.storage.ingest_segment(
  File "/opt/venv/lib/python3.10/site-packages/earhorn/stream_archive_s3.py", line 26, in ingest_segment
    self._client.upload_file(
  File "/opt/venv/lib/python3.10/site-packages/boto3/s3/inject.py", line 143, in upload_file
    return transfer.upload_file(
  File "/opt/venv/lib/python3.10/site-packages/boto3/s3/transfer.py", line 288, in upload_file
    future.result()
  File "/opt/venv/lib/python3.10/site-packages/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File "/opt/venv/lib/python3.10/site-packages/s3transfer/futures.py", line 266, in result
    raise self._exception
  File "/opt/venv/lib/python3.10/site-packages/s3transfer/tasks.py", line 139, in __call__
    return self._execute_main(kwargs)
  File "/opt/venv/lib/python3.10/site-packages/s3transfer/tasks.py", line 162, in _execute_main 
    return_value = self._main(**kwargs)
  File "/opt/venv/lib/python3.10/site-packages/s3transfer/tasks.py", line 348, in _main
    response = client.create_multipart_upload(
  File "/opt/venv/lib/python3.10/site-packages/botocore/client.py", line 514, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/venv/lib/python3.10/site-packages/botocore/client.py", line 921, in _make_api_call 
    http, parsed_response = self._make_request( 
  File "/opt/venv/lib/python3.10/site-packages/botocore/client.py", line 944, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/opt/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request 
    return self._send_request(request_dict, operation_model)
  File "/opt/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 202, in _send_request
    while self._needs_retry(
  File "/opt/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 354, in _needs_retry 
    responses = self._event_emitter.emit(
  File "/opt/venv/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File "/opt/venv/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "/opt/venv/lib/python3.10/site-packages/botocore/retryhandler.py", line 207, in __call__ 
    if self._checker(**checker_kwargs):
  File "/opt/venv/lib/python3.10/site-packages/botocore/retryhandler.py", line 284, in __call__ 
    should_retry = self._should_retry(
  File "/opt/venv/lib/python3.10/site-packages/botocore/retryhandler.py", line 320, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/opt/venv/lib/python3.10/site-packages/botocore/retryhandler.py", line 363, in __call__ 
    checker_response = checker(
  File "/opt/venv/lib/python3.10/site-packages/botocore/retryhandler.py", line 247, in __call__ 
    return self._check_caught_exception(
  File "/opt/venv/lib/python3.10/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
    raise caught_exception
  File "/opt/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 281, in _do_get_response
    http_response = self._send(request)
  File "/opt/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 377, in _send
    return self.http_session.send(request)
  File "/opt/venv/lib/python3.10/site-packages/botocore/httpsession.py", line 477, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "<endpoint>/2022/10/07_08.09.14.mp3?uploads"
jooola commented 2 years ago

You don't have any retry strategy setup right ?

For now I could suggest you to try setup AWS_MAX_ATTEMPTS and AWS_RETRY_MODE to mitigate this, but I'll have a look on how I could setup a default retry strategy for this. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html

Re queuing might be a good idea too.

paddatrapper commented 2 years ago

It's more that it seems to crash the thread and so prevent any future uploads from happening

jooola commented 2 years ago

It's more that it seems to crash the thread and so prevent any future uploads from happening

Oh right, yes, this is bad.

paddatrapper commented 2 years ago

Sorry, probably should have worded the ticket better

jooola commented 2 years ago

No worries, I think this issue raises a bigger problem, this tool is not capable of crashing and recovering as we create a new tmp dir at every start up. This makes the life of sysadmins a bit more difficult to even re upload the files by hand. For the local archive storage this wasn't really a topic, but with s3 it is.

Here is what should be done:

This should also help for unexpected shutdowns. But this should have a dedicated issue.

I'll push some quick fixes, but the big missing part will need some work.

paddatrapper commented 2 years ago

Sounds good. I can also ask @loydbanks to implement some of this next week if you want?

jooola commented 2 years ago

I already have a branch for this, it's alright. But thanks.

jooola commented 2 years ago

Solved with #132