google / go-cloud

The Go Cloud Development Kit (Go CDK): A library and tools for open cloud development in Go.
https://gocloud.dev/
Apache License 2.0
9.57k stars 812 forks source link

The blob/s3blob "BeforeWrite" callback returns a "aws-sdk-go-v2/service/s3.PutObjectInput" instance with the incorrect bucket. #3471

Closed thisisaaronland closed 3 months ago

thisisaaronland commented 3 months ago

Describe the bug

The blob/s3blob BeforeWrite callback returns a aws-sdk-go-v2/service/s3.PutObjectInput instance with the incorrect bucket.

To Reproduce

I have not been able to write a standalone test case for this (I will do that soon) but I have code with creates (2) buckets: a bucket to read data from (source) and a bucket to write data to (target).

The source bucket writer needs to set an (S3) ACL on the objects it writes so it does that here:

https://github.com/go-iiif/go-iiif/blob/s3writes/cache/blob.go#L192-L232

Per the aws-sdk-go-v2/service/s3 docs the (gocloud) asFunc tests that the input is a PutObjectInput. This operation succeeds but the object has the following errors:

Perhaps related, he value of the PutObjectInput.Body property is nil (in fact the only properties that are not nil are "Bucket" and "Key").

Expected behavior

The value of PutObjectInput.Bucket should match the bucket that created the writer whose WriterOptions contains the BeforeWrite callback.

If I manually assign the Bucket and Body properties of the PutObjectInput then the code works as expected (writes succeed and are written to the correct bucket).

Version

https://github.com/go-iiif/go-iiif/blob/s3writes/go.mod

Additional context

The buckets in question are being created using two helper packages.

The first is a simple helper function I wrote to prevent metadata files from being created:

https://github.com/aaronland/gocloud-blob/blob/main/bucket/bucket.go#L15

The second defines a custom URLOpener to allow creating S3 buckets with "named" AWS session credentials. This package is using the OpenBucketV2 method:

https://github.com/aaronland/gocloud-blob/blob/main/s3/s3blob.go

I don't think either of these packages are the problem but I mention them for thoroughness.

I will write a simplified test shortly and link to it here when I do.

vangent commented 3 months ago

The source bucket writer needs to set an (S3) ACL on the objects it writes

You mean the target bucket, right? Are you sure you're passing the right bucket?

Perhaps related, he value of the PutObjectInput.Body property is nil

This is expected; the callback is BeforeWrite, so it is called before you've written anything. The Body is set later.

thisisaaronland commented 3 months ago

You mean the target bucket, right? Are you sure you're passing the right bucket?

My bad. That is correct. The BeforeWrite callback for the target bucket is yielding a PutObjectInput instance with the source bucket. And yes, I've made sure to log "all the things" to double check the correct buckets are being passed around.

This is expected; the callback is BeforeWrite, so it is called before you've written anything. The Body is set later.

Got it, thanks. I will try to post a test script before the day it out.

thisisaaronland commented 3 months ago

Test script for reproducing the problem can be found here:

https://github.com/go-iiif/go-iiif/blob/s3writes/cmd/test-bucket-write/main.go

For example, when reading and writing to the same bucket:

> ./bin/test-bucket-write -source-bucket-uri 's3blob://sfomuseum-media?prefix=media/&region=REGION&credentials=CREDS' -target-bucket-uri 's3blob://sfomuseum-media?prefix=misc/&region=REGION&credentials=CREDS' -source-key '193/072/428/1/1930724281_oMv1xNF0ATSqjVydik0GZVDSGxlkNRVU_b.jpg' -target-key 'test.jpg'

2024/08/15 10:14:42 INFO Create source bucket uri="s3blob://sfomuseum-media?prefix=media/&region=REGION&credentials=CREDS"
2024/08/15 10:14:42 INFO Create target bucket uri="s3blob://sfomuseum-media?prefix=misc/&region=REGION&credentials=CREDS"
2024/08/15 10:14:43 INFO Create target writer key=test.jpg
2024/08/15 10:14:43 INFO Before write bucket test target=sfomuseum-media "put object"=sfomuseum-media key=misc/test.jpg

Versus reading and writing to different buckets:

> ./bin/test-bucket-write -source-bucket-uri 's3blob://sfomuseum-media?prefix=media/&region=REGION&credentials=CREDS' -target-bucket-uri 's3blob://sfomuseum-test?prefix=misc/&region=REGION&credentials=CREDS' -source-key '193/072/428/1/1930724281_oMv1xNF0ATSqjVydik0GZVDSGxlkNRVU_b.jpg' -target-key 'test.jpg'

2024/08/15 10:14:57 INFO Create source bucket uri="s3blob://sfomuseum-media?prefix=media/&region=REGION&credentials=CREDS"
2024/08/15 10:14:57 INFO Create target bucket uri="s3blob://sfomuseum-test?prefix=misc/&region=REGION&credentials=CREDS"
2024/08/15 10:14:58 INFO Create target writer key=test.jpg
2024/08/15 10:14:58 INFO Before write bucket test target=sfomuseum-test "put object"=sfomuseum-media key=misc/test.jpg
2024/08/15 10:14:58 Failed to copy source key to target key, blob (key "test.jpg") (code=Unknown): Request bucket does not match target bucket

A couple things to note:

vangent commented 3 months ago

It would be helpful if you could make that testcase work without the dependencies; if they are simple wrappers that shouldn't be too hard.

Can you print the source and target buckets just before you call NewWriter?

vangent commented 3 months ago

Your https://github.com/aaronland/gocloud-blob/blob/main/s3/s3blob.go is the problem. You create a single lazySessionOpener when you register the s3blob schema, and that struct uses a sync.Once to create a single URLOpener that has the bucket in it. The actual OpenBucketURL ignores the URL that is passed in and always uses the first bucket.

thisisaaronland commented 3 months ago

Gah! That is definitely my bad so apologies for the distraction.

I guess what I don't understand is why this problem doesn't materialize when using "v1" of the AWS SDK. It's sort of moot now that "v1" has been put on life support but... it's curious.