'bucket does not exist ' error, when host contains region

sbt / sbt-s3

sbt-s3 is a simple sbt plugin to manipulate objects on Amazon S3

Apache License 2.0

63 stars 29 forks source link

'bucket does not exist ' error, when host contains region #21

Closed kevin-lee closed 8 years ago

kevin-lee commented 9 years ago

When the host contains region, sbt-s3 is broken and it results in

[error] (root/*:s3Upload) com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket;

The host may look like this when it contains region

Bucket: my-bucket Region: ap-southeast-2 Host: my-bucket.s3-ap-southeast-2.amazonaws.com

For more details about valid region/s endpoint location, please check out http://www.bucketexplorer.com/documentation/amazon-s3--amazon-s3-buckets-and-regions.html

I think the cause of this is because getBucket() method uses a hard-coded S3 URL that is ".s3.amazonaws.com" and this cannot be used for the host containing region.

cunei commented 9 years ago

Hello @Kevin-Lee, thanks for looking into this matter. The existing version of getBucket() does not prevent users from using region-based hostnames; the call removeEndIgnoreCase() will return the source string, if it doesn't end with the specified suffix; therefore, it is sufficient to specify the qualified bucket name, including the (region-containing) suffix. The reason why getBucket() tries to strip the .s3.amazonaws.com suffix is that it is possible to define CNAMEs referring to buckets. For instance, see: http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html#VirtualHostingCustomURLs The assumption in getBucket() was that the CNAME suffix will be .s3.amazonaws.com. That is incorrect, as you showed; the automatic stripping was to be a nicety for the users, but it seems it turned out to be just confusing; it might be a good idea just to deprecate it for the future. In any event, bucket names may contain dots (images.johnsmith.net.s3.amazonaws.com.), therefore your proposed fix in https://github.com/Kevin-Lee/sbt-s3/commit/88cfa5b0ce93650b92506e73d85ff2fc8c80c11f wouldn't work, in that case. I would be instead inclined to remove the removeEndIgnoreCase() call altogether, which would solve the present ambiguities.

kevin-lee commented 9 years ago

@cunei I see. Then what about 9b1c251d9b59705f8886139ec98bed88387569f9 ?

getBucket("images.johnsmith.net.s3.amazonaws.com.")

returns

images.johnsmith.net

and

getBucket("images.johnsmith.net.s3-ap-southeast-2.amazonaws.com.")

returns

images.johnsmith.net

cunei commented 9 years ago

Hi @Kevin-Lee, I this the way the host name is inspected could be improved. Although improbable, it is perfectly legal for a bucket name to contain ".s3." or ".s3-". It might be better to use a pattern matcher, as in:

import java.util.regex.Pattern
import java.util.regex.Pattern.CASE_INSENSITIVE
Pattern.compile("\\.s3[^\\.]*\\.amazonaws\\.com$",Pattern.CASE_INSENSITIVE).split(host)(0)

That makes sure that only the trailing part is removed, even if a hostname contains "s3" somewhere. For instance:

scala> Pattern.compile("\\.s3[^\\.]*\\.amazonaws\\.com$",Pattern.CASE_INSENSITIVE).split("aaa.s3.bbb.s3-22.amazonaws.com")(0)
res2: String = aaa.s3.bbb

I will commit this change in a few weeks; I am going on vacation for a bit. In the meantime, thank you for your suggestions and your contribution!

kevin-lee commented 9 years ago

@cunei Yeah, mine is too naive. That regex looks good to me, but then, wouldn't it be better and simpler to ask the users to specify their bucket names and regions? Then sbt-s3 can build the hosts out of them.

Why does it need to ask them any more information than what they need to give?

So,

no region is found => just bucket.name.s3.amazonaws.com
region is given => bucket.name.s3-region.amazonaws.com

It may ask the users,

bucketName in upload := "some.bucket.name.containing.s3"
region in upload := "ap-southeast-2"

The host generated by sbt-s3

some.bucket.name.containing.s3.s3-ap-southeast-2.amazonaws.com

Without region:

bucketName in upload := "some.bucket.name.containing.s3"

The host generated by sbt-s3

some.bucket.name.containing.s3.s3.amazonaws.com

Anyway, your solution is good and solves my issue so thank you. :+1:

kevin-lee commented 9 years ago

@cunei Oh, and... have a good holiday! :smile: