rlauer6 / perl-amazon-s3

A portable client library for working with and managing Amazon S3 buckets and keys.
http://search.cpan.org/dist/Amazon-S3/
2 stars 6 forks source link

s3 response Etag and md5 do not always match exactly #18

Open YuseiUeno opened 2 weeks ago

YuseiUeno commented 2 weeks ago

What I'm having trouble with

Amazon::S3::Bucket#add_key_filename some time execute croak

Because, s3 response Etag and md5 do not always match exactly

docs

https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

The entity tag (ETag) for an object represents a specific version of that object. Keep in mind that the ETag reflects changes only > to the content of an object, not to its metadata. If only the metadata of an object changes, the ETag remains the same.

Depending on the object, the ETag of the object might be an MD5 digest of the object data:

If an object is created by the PutObject, PostObject, or CopyObject operation, or through the AWS Management Console, and that object is also plaintext or encrypted by server-side encryption with Amazon S3 managed keys (SSE-S3), that object has an ETag that is an MD5 digest of its object data.

If an object is created by the PutObject, PostObject, or CopyObject operation, or through the AWS Management Console, and that object is encrypted by server-side encryption with customer-provided keys (SSE-C) or server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS), that object has an ETag that is not an MD5 digest of its object data.

If an object is created by either the Multipart Upload or Part Copy operation, the object's ETag is not an MD5 digest, regardless of the method of encryption. If an object is larger than 16 MB, the AWS Management Console uploads or copies that object as a multipart upload, and therefore the ETag isn't an MD5 digest.

YuseiUeno commented 2 weeks ago

https://github.com/rlauer6/perl-amazon-s3/blob/00139a1a287c3a73094c3d81a503eec6fae44488/src/main/perl/lib/Amazon/S3/Bucket.pm.in#L695-L706

YuseiUeno commented 2 weeks ago

Net::Amazon::S3 is not checking etag now.

https://github.com/rustyconover/net-amazon-s3/issues/109

rlauer6 commented 2 weeks ago

I think an approach might be to detect that the object being fetched was encrypted by looking at either the request headers or the response headers rather than throwing out the concept of checking the integrity of the download altogether.

The response header may contain the headers:

x-amz-server-side-encryption

The server-side encryption algorithm used when you store this object in Amazon S3 (for example, AES256, aws:kms, aws:kms:dsse).

x-amz-server-side-encryption-customer-algorithm

If server-side encryption with a customer-provided encryption key was requested, the response will include this header to confirm the encryption algorithm that's used.

x-amz-server-side-encryption-customer-key-MD5

If server-side encryption with a customer-provided encryption key was requested, the response will include this header to provide the round-trip message integrity verification of the customer-provided encryption key.`

So in either case we can't do anything regarding validation of the object content except possibly return the headers and allow the caller to do an integrity check themselves (assuming they also uploaded a checksum).

So I think the approach is to avoid checking the ETag against the MD5 value if those headers are present or it is a multiparty upload.

If that makes sense to you I will prep a fix.

YuseiUeno commented 1 week ago

https://github.com/rustyconover/net-amazon-s3/issues/109#issuecomment-953368331 according to his words

Having given this further though, the http spec (https://datatracker.ietf.org/doc/html/rfc7232#section-2.3) defines an etag as

An entity-tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both.

etags are not meant for validating the content was fetched correctly.

But I don't care as long as I can download it.

If you want to check multipart-etag you can also refer to s3etag's algorithm