nyaruka / gocommon

Common utility library for the TextIt platform.
Other
7 stars 11 forks source link

S3 storage urls should include the bucket region #48

Closed KaitCrawford closed 3 years ago

KaitCrawford commented 3 years ago

The URL returned for stored files does not include the bucket region.

Attachments are stored in the correct bucket but the URL returned and stored on the message is incorrect. The URL is defined here https://github.com/nyaruka/gocommon/blob/main/storage/s3.go#L19

As per the S3 naming conventions the bucket region should form part of the endpoint URL https://docs.aws.amazon.com/general/latest/gr/s3.html

rowanseymour commented 3 years ago

Hi @KaitCrawford it's pretty straightforward to include the region (this is the change: https://github.com/nyaruka/gocommon/pull/50) but am wondering what the reason for this is? When I view uploaded files in one of our buckets in the AWS console, the URL they provide doesn't have the region.. did you run into a problem with this?

KaitCrawford commented 3 years ago

Thanks @rowanseymour Yeah, we have a project running that's connected to a bucket in the africa region. We needed to forward media attachments to another service and we found that the urls in the attachment objects were wrong. At the time we worked around it by manually adding the region to the urls like this @(replace(attachment_parts(input.attachments[0]).url, "s3.amazonaws", "s3.af-south-1.amazonaws"))

rowanseymour commented 3 years ago

Ah ok the endpoint without region only works for us-east-1...

KaitCrawford commented 3 years ago

Yeah. My understanding is that buckets in us-east-1 don't need to have the region in the endpoint for legacy reasons. Because S3 used to only be available there.

nicpottier commented 3 years ago

So I don't think this is true.. here's an attachment in eu-west-1 that works fine without the region: https://dl-rapidpro-io.s3.amazonaws.com/attachments/1/36473/steps/3e2ade8f-9c92-4d54-9d3e-b3780cfb5a5e.jpg

It also works with the region: https://dl-rapidpro-io.s3.eu-west-1.amazonaws.com/attachments/1/36473/steps/3e2ade8f-9c92-4d54-9d3e-b3780cfb5a5e.jpg

To me there's actually an advantage to not having the region in, as it allows for you to actually move a bucket to a different region if needed.

rowanseymour commented 3 years ago

Ha that's funny I was just looking at the same file - maybe it's true for us-east-1 and eu-west-1 ?

KaitCrawford commented 3 years ago

That's odd. According to the documentation I linked above only us-east-1 has s3.amazonaws.com as a standard endpoint

nicpottier commented 3 years ago

On https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access I do see they say For some Regions, the legacy global endpoint can be used to construct requests that do not specify a Region-specific endpoint. .. so ya, maybe it's just some.

Ok, ya, seems we need the region in there.

rowanseymour commented 3 years ago

Apologies for a big picture of my face but testing this with a bucket in South America..

URL AWS gives us: https://south-america-test.s3.sa-east-1.amazonaws.com/rowan.png

Requesting https://south-america-test.s3.amazonaws.com/rowan.png redirects to https://south-america-test.s3-sa-east-1.amazonaws.com/rowan.png

nicpottier commented 3 years ago

Looks like it is based on when the bucket was created:

` Virtual Hosted-Style Requests for Other Regions

The legacy global endpoint is also used for virtual hosted-style requests in other supported Regions. If you create a bucket in a Region that was launched before March 20, 2019 and use the legacy global endpoint, Amazon S3 updates the DNS to reroute the request to the correct location, which might take time. In the meantime, the default rule applies, and your virtual hosted–style request goes to the US East (N. Virginia) Region. Amazon S3 then redirects it with an HTTP 307 redirect to the correct Region. For S3 buckets in Regions launched after March 20, 2019, the DNS doesn't route your request directly to the AWS Region where your bucket resides. It returns an HTTP 400 Bad Request error instead. For more information, see Making requests. `