peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.57k stars 225 forks source link

Does not support Cloudflare R2 multi part uploads #525

Closed Lusitaniae closed 2 weeks ago

Lusitaniae commented 1 year ago

Similar to https://github.com/s3tools/s3cmd/issues/1273

Cloudflare is returning a different ETAG value and breaking the S3

./s5cmd version
v2.0.0-83ce8bc
$ ./s5cmd --credentials-file=/etc/s5cmd-r2.cfg --endpoint-url https://[endpoint].r2.cloudflarestorage.com cp test.tar s3://[bucket-name]
ERROR "cp test.tar.zst s3://[bucket-name]/test.tar": MultipartUpload: upload multipart failed upload id: APWqZzKqCR72sxCp2YBxWv4Ws4+6j8dqrntAIyj5E6tHhrLX6QyjPmRz1vHA+xqCxaQD/fA8mjiYebA49C8ourPxGZs+KAPNIqTBqmSCFfBE3ZcEpWVL/PM1ayCfWaBEhRnNY8mJzb+p6keD5QoZO9HvzHAe8M0fPlUg7Gy9g4g2BEcyb2FUBD5fqEkV6r8zbTAtYGRvqE9IqvyDWnUdrFyuT7+BoX+vCpPewty7Utruzs8pS2YD5XIuuUeb4VtnF/ksL7eEuP/ISW275iV4NuBjIS0uimORr25Il+uI0lI2NhVgZJPxHhVGLiwp+fRe1w== caused by: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your secret access key and signing method.  status code: 403, request id: , host id:

(don't mind the error message, ls and simple uploads work fine)

Frederik-Baetens commented 1 year ago

This may not be the same issue as in s3cmd, since as far as I know, S5cmd doesn't perform the same md5 check, which is what breaks the compatibility between s3cmd and R2.

Also it seems like if I choose a new filename every time I can get the multipart upload to succeed. keeping the concurrency to lower than 20 also seems to help, but i'm not sure if that's relevant. Once an upload to a filename failed with this error, it seems to fail on most subsequent runs that attempts to overwrite that file. Choosing a new filename usually works.

What I suspect might be happening is that when an uploadpart fails, there's something that goes wrong in that retry process. Subsequent uploads to the same filename might be attempting to resume the same ongoing multipart upload perhaps, which would explain why the subsequent uploads to the same file also fail. There seems to be some randomness to it though, sometimes uploads to the same filename do succeed, so maybe it has nothing to do with that?

Perhaps there's something related to how s5cmd handles 429 responses & that causes it to send a malformed request afterwards?

Edit: upon further examination, this seems heavily related to the retry logic. when setting --log=debug, I can see that whenever I get a 429, the upload fails with that SignatureDoesNotMatch error, while it never fails if I don't get a 429 first.

Frederik-Baetens commented 1 year ago

Hmm, I I've also been able to get these SignatureDoesNotMatch errors in RClone (albeit very rarely), so this might actually be an incompatibility between the aws-sdk-go and R2.

Frederik-Baetens commented 1 year ago

I found this RClone issue, which has now been solved. Might s5cmd suffer from a similar issue? https://github.com/rclone/rclone/issues/5422

salim-b commented 9 months ago

I think this issue has been fixed by Cloudflare on June 21, 2023. The corresponding documentation now says:

The ETags for objects uploaded via multipart are different than those uploaded with PutObject.

For uploads created after June 21, 2023, R2’s multipart ETags now mimic the behavior of S3. The ETag of each individual part is the MD5 hash of the contents of the part. The ETag of the completed multipart object is the hash of the MD5 sums of each of the constituent parts concatenated together followed by a hyphen and the number of parts uploaded.

(...)

leng-yue commented 3 months ago

The issue is still there

bercknash commented 2 weeks ago

We've now fixed the issue where retries get a 403 SignatureNotMatch on a retry. Some of the SDK's set Expect: 100-continue on retries and also included that header in the sigv4 signature. We did not properly process signatures that included the expect header.

igungor commented 2 weeks ago

@bercknash Thanks for chiming in!

I've verified that it works as expected:

# s5cmd --profile r2 --endpoint-url https://<account-id>.r2.cloudflarestorage.com cp 10mb.file s3://<bucket>/
cp 10mb.file s3://<bucket>/10mb.file

$ echo $?
0