skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1k stars 58 forks source link

[bug] Multipart upload to multiple buckets in the same region fails #840

Open lynnliu030 opened 1 year ago

lynnliu030 commented 1 year ago

Describe the bug Run the following code to transfer 15GB data from one source to two destination buckets in the same region aws:us-east-1, the transfer fails

client = skyplane.SkyplaneClient()
pipeline = client.pipeline() 
src_bucket = "s3://read-us-east-1/reshard-model_part-0.pt"
dst_bucket = "s3://write-us-east-1/reshard-model_part-0.pt"
dst_bucket2 = "s3://fetch-us-east-1/reshard-model_part-1.pt"
pipeline.queue_copy(src_bucket, [dst_bucket, dst_bucket2], recursive=False)

dp = pipeline.create_dataplane(debug=True)
with dp.auto_deprovision():
    try:
        dp.provision(spinner=True)
        dp.run(pipeline.jobs_to_dispatch, hooks=ProgressBarTransferHook(dp.topology.dest_region_tags))
    except KeyboardInterrupt:
        dp.copy_gateway_logs()
        try:
            force_deprovision(dp)
        except Exception as e:
            console.print(e)
    except Exception as e:
        force_deprovision(dp)

Gateway Log

2023-05-09T03:17:56.595177698Z 03:17:56 [ERROR]  Exception: An error occurred (NoSuchUpload) when calling the 
2023-05-09T03:17:56.595192951Z UploadPart operation: The specified upload does not exist. The upload ID may be 
2023-05-09T03:17:56.595194689Z invalid, or the upload may have been aborted or completed.

This should be because currently the multipart upload ID is initialized per region. It should be done per-region per-bucket per-key: https://github.com/skyplane-project/skyplane/blob/54f8a4cee2e2149ab25352460620a1958a095316/skyplane/api/transfer_job.py#L114