storj / edge

Storj edge services (including multi-tenant, S3-compatible server to interact with the Storj network)
GNU Affero General Public License v3.0
48 stars 18 forks source link

Verify that the common case of UploadPartCopy is what AWS CLI does #338

Closed amwolff closed 9 months ago

amwolff commented 1 year ago

Goal

The objective of this task is to verify that AWS CLI uses the UploadPartCopy API to copy objects greater than 5 GB in size while maintaining the original multipart upload structure and not using arbitrary byte ranges or other methods. We can't support the latter, so this is to find out if we can conform to S3's internal limitation without reconfiguring all clients while using Storj's S3-compatible API.

Acceptance Criteria

Links

nergdron commented 11 months ago

got a transparent proxy to run awscli through, connected to a personal bucket through the gateway. uploading a sufficiently large file to test copying with.

nergdron commented 11 months ago

ok, testing with minio. I've identified:

that looks like the whole multipart upload workflow.

@amwolff is there anything else you're looking for here?

amwolff commented 11 months ago

@nergdron after you upload with multipart upload, you need to check that UploadPartCopy copies parts in the same manner that MPU uploaded parts (so while issuing a copy for a big enough object to trigger UploadPartCopy)

nergdron commented 10 months ago

alright, on an s3 copy, we get the following:

so it appears to be using the same workflow as the original upload, but all the transfers are done server side, as the total packet capture for the copy is only 2.5MiB, instead of the ~12GiB for the original upload.

@amwolff, is this the behaviour you were looking for here? it's what I'd expect from reading the AWS docs.

amwolff commented 10 months ago

Do you have the log of requests/responses? It would be useful to see how many UploadPartCopies it's doing.

nergdron commented 10 months ago

This is the packet capture from doing a copy operation on a ~10G file inside of the same bucket. aws.pcap.gz

amwolff commented 10 months ago

from sprint planning: Paul reports that the CLI is doing what rclone does (#337) in the most common case.