skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1.09k stars 62 forks source link

[bug] Error during large transfer: `urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=56675): Max retries exceeded with url: /api/v1/profile/compression` #706

Open sarahwooders opened 1 year ago

sarahwooders commented 1 year ago

Describe the bug I was trying to transfer this dataset (https://github.com/fMoW/dataset) into my own bucket using the command:

skyplane cp -r s3://spacenet-dataset/Hosted-Datasets/fmow/fmow-rgb/ s3://sarah-skylark-us-east-1/fmow-rgb/ -n 4

Although the transfer completed, I got the following error:

Traceback (most recent call last):
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in
_make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in
_make_request
    httplib_response = conn.getresponse()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/sarahwooders/repos/skyplane/skyplane/cli/cli_impl/cp_replicate.py", line 334, in launch_replication_job
    stats = rc.monitor_transfer(
  File "/Users/sarahwooders/repos/skyplane/skyplane/replicate/replicator_client.py", line 697, in monitor_transfer
    stats = self.http_pool.request("GET", f"{gateway.gateway_api_url}/api/v1/profile/compression")
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/request.py", line 74, in request
    return self.request_encode_url(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/request.py", line 96, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=56675): Max retries exceeded with url:
/api/v1/profile/compression (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection
without response')))

HTTPConnectionPool(host='127.0.0.1', port=56675): Max retries exceeded with url: /api/v1/profile/compression (Caused by
ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))
✓ Deprovisioning instances (4/4) in 12.76s
/usr/local/anaconda3/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

To Reproduce Run a transfer from s3://spacenet-dataset/Hosted-Datasets/fmow/fmow-rgb/.

Expected behavior Multipart uploads should have successfully completed.

Screenshots If applicable, add screenshots to help explain your problem.

image

Transfer client log In the log output from Skyplane, please upload the debug log from the CLI. You can find the path to the file in the log output:

$ skyplane cp ...
...
Storing debug information for transfer in /tmp/skyplane/transfer_logs/...
...

Environment info (please complete the following information):

sarahwooders commented 1 year ago

I also got this error when running: skyplane cp -r s3://sky-imagenet-data/datasets/ILSVRC2012/ s3://skyplane-broadcast/imagenet-images/ -n 4

The error happens during chunk dispatching:

Exception in thread Thread-51:
Traceback (most recent call last):
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in
urlopen
    httplib_response = self._make_request(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in
_make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in
_make_request
    httplib_response = conn.getresponse()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/anaconda3/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 135, in run
    raise e
  File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 123, in run
    for cr in cr_streams[job_uuid]:
  File "/Users/sarahwooders/repos/skyplane/skyplane/api/transfer_job.py", line 389, in dispatch
    reply = self.http_pool.request(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/request.py", line 170, in
request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/poolmanager.py", line 376, in
urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in
urlopen
    retries = retries.increment(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/util/retry.py", line 550, in
increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/packages/six.py", line 769, in
reraise
    raise value.with_traceback(tb)
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in
urlopen
    httplib_response = self._make_request(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in
_make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in
_make_request
    httplib_response = conn.getresponse()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without
response'))
image