sillsdev / machine.py

Machine is a natural language processing library for Python that is focused on providing tools for processing resource-poor languages.
MIT License
10 stars 2 forks source link

S3 bucket connection reset - gracefully handle #17

Closed johnml1135 closed 1 year ago

johnml1135 commented 1 year ago

There should be at least 3, if not 10 auto-retries when this happens:

2023-06-08 16:31:00,036 - silnlp.common.environment - INFO - Uploading MT/experiments/FT-Ingush/NLLB_13_CHE_ING_3/val.trg.txt
2023-06-08 16:31:00,153 - silnlp.common.environment - INFO - Uploading MT/experiments/FT-Ingush/NLLB_13_CHE_ING_3/tokenizer.json
2023-06-08 12:31:13
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 94, in _send_request
    rval = super()._send_request(
  File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 130, in _send_output
    self._handle_expect_response(message_body)
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 176, in _handle_expect_response
    self._send_message_body(message_body)
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 209, in _send_message_body
    self.send(message_body)
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp/.venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 218, in send
    return super().send(str)
  File "/usr/lib/python3.8/http/client.py", line 969, in send
    self.sock.sendall(datablock)
  File "/usr/lib/python3.8/ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.8/ssl.py", line 1173, in send
    return self._sslobj.write(data)
ConnectionResetError: [Errno 104] Connection reset by peer
johnml1135 commented 1 year ago

This happened in SILNLP - but likely applies here as well. SILNLP issue - https://github.com/sillsdev/silnlp/issues/167