transloadit / uppy

The next open source file uploader for web browsers :dog:
https://uppy.io
MIT License
28.91k stars 1.98k forks source link

Support additional Amazon S3 checksum algorithms for @uppy/aws-s3-multipart and @uppy/companion #3885

Open daghendrik opened 2 years ago

daghendrik commented 2 years ago

Amazon S3’s PutObject function now lets you specify the use of any one of four widely used checksum algorithms (SHA-1, SHA-256, CRC-32, and CRC-32C) when you upload each of your objects to S3, in addition to MD5. (More details on Amazon's blog post from February 2022)

It would be great if Uppy could support this for S3 Multipart uploads, letting us choose from other algorithms than just MD5, depending on our needs (speed, security, etc.).

A gist of how this works has been created by @hguillermo (comment) in issue https://github.com/transloadit/uppy/issues/3391

What I believe needs to be changed in the codebase is:

Companion

  1. The params in function createMultipartUpload in uppy/companion/src/server/controllers/s3.js would have to allow for ChecksumAlgorithm to be set ( values: "CRC32"|"CRC32C"|"SHA1"|"SHA256"|string; )
  2. The functions handling the signing of the parts in s3.js (signPartUpload and batchSignPartsUpload) would have add support for the keys ChecksumCRC32, ChecksumCRC32C, ChecksumSHA1, ChecksumSHA256 in the object passed to s3Client. The checksum values would then be sent in as a parameter from the Uppy Client.

Uppy Client

On the client side the browser would have to calculate a the checksum (CRC32, CRC32C, SHA1 or SHA256) of each individual chunk (just like this suggestion for calculateMD5), pass it to the companion, and once a signed url is received (from Companion or a custom backend) I believe you also have do add the chunk's checksum value as a header sent along the S3 PUT ("x-amz-checksum-crc32", "x-amz-checksum-crc32c", etc)

Another requirement for this to be possible is that individual headers can be sent along each chunk, which is already addressed in the case for being able to verify the integrity using the "Content-MD5" header. See issue https://github.com/transloadit/uppy/issues/3881 by @kevin-west-10x

Murderlon commented 2 years ago

Thanks for the detailed issue. This helps a lot.

daghendrik commented 1 year ago

@Murderlon , Thanks for adding the possibility to send unique headers along with each chunk in @uppy/aws-s3-multipart in Uppy 3.x (#3895).

I haven't upgraded my web app to Uppy 3 yet, but from first glance it seems that I can use the new S3 Multipart custom header support to pass along other checksums than "Content-MD5" to S3 for each upload-part, for instance"x-amz-checksum-crc32", "x-amz-checksum-crc32c", etc.

Since I'm using a custom backend to recreate the companion API for the Uppy client, I'll be able to set ChecksumAlgorithm in my backend code to my desired value ("CRC32"|"CRC32C"|"SHA1"|"SHA256"|string; ). I'll try to find time to experiment with this next week. However, it would be even better if this was something that Companion supported natively. From what I can see, it would require these changes:

  1. The params in function createMultipartUpload in uppy/companion/src/server/controllers/s3.js would have to allow for ChecksumAlgorithm to be set ( values: "CRC32"|"CRC32C"|"SHA1"|"SHA256"|string; )
  2. The functions handling the signing of the parts in s3.js (signPartUpload and batchSignPartsUpload) would have to add support for the keys ChecksumCRC32, ChecksumCRC32C, ChecksumSHA1, ChecksumSHA256 in the object passed to s3Client.
  3. The Uppy client functions for createMultiPartUpload and prepareUploadParts would have to add optional parameters for checksum algorithm.

Do you think there's a possibility of this being implemented in @uppy/companion & @uppy/aws-s3-multipart anytime soon?

Murderlon commented 1 year ago

Hi, thanks for the update. We currently have two focus months with some priorities (not just Uppy, also other orgs like Tus) and @uppy/aws-s3-multipart is not part of that. So if I'm totally honest I don't think we'll add this soon. If you feel this should be added sooner, we can help along if you create a PR for it.

daghendrik commented 1 year ago

Thanks for the quick and honest reply!

pcriv commented 2 months ago

@daghendrik Did you manage to make your backend work with the extra checksum algorithms?

daghendrik commented 2 months ago

No, only "Content-MD5".

Best,

Dag Hendrik Lerdal

On Thu, 20 Jun 2024 at 10:50, Pablo Crivella @.***> wrote:

@daghendrik https://github.com/daghendrik Did you manage to make your backend work with the extra checksum algorithms?

— Reply to this email directly, view it on GitHub https://github.com/transloadit/uppy/issues/3885#issuecomment-2180155255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6ZCUQ5LR7DRXEHKQKKATZIKJVLAVCNFSM6AAAAABJTPYZ7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGE2TKMRVGU . You are receiving this because you were mentioned.Message ID: @.***>