spring-attic / spring-cloud-aws

All development has moved to https://github.com/awspring/spring-cloud-aws Integration for Amazon Web Services APIs with Spring
https://awspring.io/
Apache License 2.0
590 stars 376 forks source link

SimpleStorageResource doesn't use recommended URL for S3 resources #686

Open fransflippo opened 3 years ago

fransflippo commented 3 years ago

Describe the bug

In: spring-cloud-aws-core 2.2.3-RELEASE

The URL returned by SimpleStorageResource.getURL() is produced with the following code:

        Region region = this.amazonS3.getRegion().toAWSRegion();
        String encodedObjectName = URLEncoder.encode(this.objectName,
                StandardCharsets.UTF_8.toString());
        return new URL("https", region.getServiceEndpoint(AmazonS3Client.S3_SERVICE_NAME),
                "/" + this.bucketName + "/" + encodedObjectName);

For a bucket named mybucket.example.com and an object named path/file.txt, this results in https://s3.eu-west-1.amazonws.com/mybucket.example.com/path/file.txt .

This is what Amazon calls "path-style" addressing model: the host name is generic (just specific to a region) and the bucket name is part of the path-part of the URL. In SimpleStorageResource, the region is either set by Spring Cloud based on the cloud.aws.region.static system property, or determined based on the region the Spring Boot app is running in (when running on an AWS EC2 instance and cloud.aws.stack.auto is true). The region that the S3 bucket being accessed resides in is not necessarily that in which the EC2 instance is running. In fact, an app might access several S3 buckets in different regions, depending on the use case. So the above mechanism for determining the URL is incorrect unless the region set for the Spring Boot app happens to be the one the S3 bucket is located in.

But that's just part of the problem.

In https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ , Amazon explains that the path-style addressing model is being deprecated. The result is that a GET request to the URL produced by SimpleStorageResource will result in a 301 response with no Location header and a body something like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <Error>
      <Code>PermanentRedirect</Code>
      <Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message>
      <Endpoint>mybucket.example.com.s3.eu-west-3.amazonaws.com</Endpoint>
      <Bucket>mybucket.example.com</Bucket>
      <RequestId>0A4CCF4B40988153</RequestId> 
      <HostId>oJz4RiTfjOgh+KlNZUJnLnxd5CZgQPTDi5hGUGf60b1UMa4x01w4wyFQ41ulNlNqMtqEzrc66hk=</HostId>
    </Error>

In the blog post at https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ and the AWS documentation at https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html explains (with some internal inconsistencies) that the request should go to bucketname.s3.amazonws.com and their DNS servers will issue an IP address that ensures the request goes to the correct region.

In short, SimpleStorageResource.getURL() should return https://bucketname.s3.amazonws.com/path (e.g. https://mybucket.example.com.s3.amazonws.com/path/file.txt) instead of https://s3.region.amazonws.com/bucketname/path.

maciejwalkowiak commented 3 years ago

Thanks for reporting. Due to the breaking nature of this lets put it to 3.0 version.