skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1k stars 58 forks source link

[bug] API transfer for large numbers of files`http.client.RemoteDisconnected: Remote end closed connection without response` #709

Open sarahwooders opened 1 year ago

sarahwooders commented 1 year ago

Describe the bug When running the Skyplane API client for a large transfer, I get the error http.client.RemoteDisconnected: Remote end closed connection without response. This issue consistently happens for large transfers.

To Reproduce Run the command:

python examples/aws_bucket_replication.py --src_region aws:ap-east-1 --dst_regions "aws:ap-southeast-2 aws:ap-south-1 aws:ap-northeast-3 aws:ap-northeast-2 aws:ap-northeast-1" --target_data s3://broadcast-experiment-ap-east-1/test_replication/test

Expected behavior The transfer should successfully complete and move to the next part of the script.

Error

Exception in thread Thread-181:
Traceback (most recent call last):
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/sarahwooders/repos/skyplane/env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/anaconda3/lib/python3.9/http/client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

Transfer client log client.log

Environment info (please complete the following information):

parasj commented 1 year ago

@sarahwooders I tried to replicate this by running this several times: python examples/aws_bucket_replication.py --src_region aws:ap-east-1 --dst_regions "aws:ap-southeast-2 aws:ap-south-1 aws:ap-northeast-3 aws:ap-northeast-2 aws:ap-northeast-1" --target_data s3://skyplane-vmscalability-us-east-1/imagenet-bucket

However, I haven't gotten that error yet:

ap-northeast-3: Object replicated = 583 / 584
I1201 11:58:19.468634 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-2: Object replicated = 584 / 584
I1201 11:58:21.063171 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-1: Object replicated = 584 / 584
I1201 11:58:23.676292 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-southeast-2: Object replicated = 584 / 584
I1201 11:58:25.197893 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-south-1: Object replicated = 584 / 584
I1201 11:58:27.433638 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-3: Object replicated = 584 / 584
I1201 11:58:28.766040 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-2: Object replicated = 584 / 584
I1201 11:58:30.240510 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-1: Object replicated = 584 / 584
All replication completed!
11:58:31 [ERROR] Deprovisioning dataplane
11:58:31 [WARN]  Before deprovisioning, waiting for jobs to finish: ['7b4fe60d-fe89-455a-bf96-7d59bf2ff963']
I1201 11:58:31.708284 6341865472 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:31.943673 6325039104 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:31.984677 6149992448 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:31.997760 6166818816 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.009566 10754224128 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.300149 6375518208 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.366064 6133166080 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.383546 6392344576 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.403326 6409170944 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.417773 6116339712 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.437181 6358691840 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.466798 6425997312 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:32.753158 6341865472 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.444893 6325039104 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.487623 6149992448 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.550292 10754224128 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.598582 6166818816 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.623337 6375518208 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.708380 6392344576 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.716182 6425997312 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.724666 6133166080 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.736055 6116339712 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.749280 6409170944 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:33.760143 6358691840 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:35.789879 6341865472 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:35.978958 6392344576 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.163058 6375518208 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.721050 6409170944 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.766060 10754224128 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.801144 6425997312 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.823471 6358691840 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.938296 6325039104 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:36.999013 6149992448 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:37.015353 6133166080 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:37.201780 6116339712 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:37.244402 6166818816 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:38.830965 6133166080 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:38.831243 6116339712 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
I1201 11:58:40.732422 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-east-1
ap-southeast-2
I1201 11:58:42.347029 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-south-1
I1201 11:58:43.839463 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-3
I1201 11:58:46.640977 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-2
I1201 11:58:47.956257 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
ap-northeast-1
I1201 11:58:49.525029 8091034880 credentials.py:1311] Found credentials in shared credentials file: ~/.aws/credentials
aws_replication_1669924177.csv
Successfully detached policy arn:aws:iam::376324600572:policy/skyplane-bucket-replication-policy-1669924191
Successfully detached policy arn:aws:iam::376324600572:policy/batchskyplane-bucket-replication-policy-1669924191
Deleted role skyplane-bucket-replication-role-1669924191
em1208 commented 5 months ago

I'm trying to copy data from GCP storage to AWS s3 and I'm getting the same issue.

Exception in thread Thread-67:
Traceback (most recent call last):
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connection.py", line 461, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/lib/python3.11/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/skyplane/api/tracker.py", line 153, in run
    for chunk in chunk_stream:
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/skyplane/api/transfer_job.py", line 644, in dispatch
    reply = self.http_pool.request(
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/_request_methods.py", line 118, in request
    return self.request_encode_body(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/_request_methods.py", line 217, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/poolmanager.py", line 444, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connectionpool.py", line 844, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/urllib3/connection.py", line 461, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/lib/python3.11/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/skyplane/api/tracker.py", line 166, in run
    UsageClient.log_exception(
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/skyplane/api/usage.py", line 147, in log_exception
    stats = client.make_error(
            ^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/skyplane/api/usage.py", line 304, in make_error
    dest_regions = [tag.split(":")[1] for tag in dest_region_tags]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<replaced>/environments/skyplane/lib/python3.11/site-packages/skyplane/api/usage.py", line 304, in <listcomp>
    dest_regions = [tag.split(":")[1] for tag in dest_region_tags]
                    ~~~~~~~~~~~~~~^^^
IndexError: list index out of range