skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
999 stars 58 forks source link

[bug] keep getting Remote end closed connection without response after few data transfer from GCP to AWS #920

Open meanii opened 12 months ago

meanii commented 12 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior (please include the full Skyplane command you ran): error encountered in: skyplane cp -r gs://AAAAAAA/ s3://BBBBBBBBB/ -n 8

Screenshots

image

Transfer client log

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.8/dist-packages/skyplane/api/tracker.py", line 166, in run
    UsageClient.log_exception(
  File "/usr/local/lib/python3.8/dist-packages/skyplane/api/usage.py", line 147, in log_exception
    stats = client.make_error(
  File "/usr/local/lib/python3.8/dist-packages/skyplane/api/usage.py", line 304, in make_error
    dest_regions = [tag.split(":")[1] for tag in dest_region_tags]
  File "/usr/local/lib/python3.8/dist-packages/skyplane/api/usage.py", line 304, in <listcomp>
    dest_regions = [tag.split(":")[1] for tag in dest_region_tags]
$ skyplane cp -r gs://AAAAAAA/ s3://BBBBBBBBB/ -n 8
...
Storing debug information for transfer in /tmp/skyplane/transfer_logs/...

==> /tmp/skyplane/transfer_logs/20230902_122923-171c948d <==
tail: error reading '/tmp/skyplane/transfer_logs/20230902_122923-171c948d': Is a directory
tail: /tmp/skyplane/transfer_logs/20230902_122923-171c948d: cannot follow end of this type of file; giving up on this name

==> /tmp/skyplane/transfer_logs/20230902_122931 <==
tail: error reading '/tmp/skyplane/transfer_logs/20230902_122931': Is a directory
tail: /tmp/skyplane/transfer_logs/20230902_122931: cannot follow end of this type of file; giving up on this name
tail: no files remaining
...

Environment info (please complete the following information):

sarahwooders commented 12 months ago

Hi @meanii thanks for reporting this - If you don't mind sharing, about how many files are you trying to transfer and what size are they? Also, how long did it take before the crash happened?

meanii commented 12 months ago

Hi @meanii thanks for reporting this - If you don't mind sharing, about how many files are you trying to transfer and what size are they? Also, how long did it take before the crash happened?

Over 170 TB of data

sarahwooders commented 11 months ago

Do you know what the file sizes tend to me? For small files, you may want use n=1 for the number of VMs, since the bottleneck will be in listing the files.