Closed kevin-lee closed 8 years ago
Hello @Kevin-Lee, thanks for looking into this matter.
The existing version of getBucket()
does not prevent users from using region-based hostnames; the call removeEndIgnoreCase()
will return the source string, if it doesn't end with the specified suffix; therefore, it is sufficient to specify the qualified bucket name, including the (region-containing) suffix.
The reason why getBucket()
tries to strip the .s3.amazonaws.com
suffix is that it is possible to define CNAMEs referring to buckets. For instance, see: http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html#VirtualHostingCustomURLs
The assumption in getBucket()
was that the CNAME suffix will be .s3.amazonaws.com
. That is incorrect, as you showed; the automatic stripping was to be a nicety for the users, but it seems it turned out to be just confusing; it might be a good idea just to deprecate it for the future. In any event, bucket names may contain dots (images.johnsmith.net.s3.amazonaws.com.
), therefore your proposed fix in https://github.com/Kevin-Lee/sbt-s3/commit/88cfa5b0ce93650b92506e73d85ff2fc8c80c11f wouldn't work, in that case.
I would be instead inclined to remove the removeEndIgnoreCase()
call altogether, which would solve the present ambiguities.
@cunei I see. Then what about 9b1c251d9b59705f8886139ec98bed88387569f9 ?
getBucket("images.johnsmith.net.s3.amazonaws.com.")
returns
images.johnsmith.net
and
getBucket("images.johnsmith.net.s3-ap-southeast-2.amazonaws.com.")
returns
images.johnsmith.net
Hi @Kevin-Lee, I this the way the host name is inspected could be improved. Although improbable, it is perfectly legal for a bucket name to contain ".s3." or ".s3-". It might be better to use a pattern matcher, as in:
import java.util.regex.Pattern
import java.util.regex.Pattern.CASE_INSENSITIVE
Pattern.compile("\\.s3[^\\.]*\\.amazonaws\\.com$",Pattern.CASE_INSENSITIVE).split(host)(0)
That makes sure that only the trailing part is removed, even if a hostname contains "s3" somewhere. For instance:
scala> Pattern.compile("\\.s3[^\\.]*\\.amazonaws\\.com$",Pattern.CASE_INSENSITIVE).split("aaa.s3.bbb.s3-22.amazonaws.com")(0)
res2: String = aaa.s3.bbb
I will commit this change in a few weeks; I am going on vacation for a bit. In the meantime, thank you for your suggestions and your contribution!
@cunei Yeah, mine is too naive. That regex looks good to me, but then, wouldn't it be better and simpler to ask the users to specify their bucket names and regions? Then sbt-s3 can build the hosts out of them.
Why does it need to ask them any more information than what they need to give?
So,
bucket.name.s3.amazonaws.com
bucket.name.s3-region.amazonaws.com
It may ask the users,
bucketName in upload := "some.bucket.name.containing.s3"
region in upload := "ap-southeast-2"
The host generated by sbt-s3
some.bucket.name.containing.s3.s3-ap-southeast-2.amazonaws.com
Without region:
bucketName in upload := "some.bucket.name.containing.s3"
The host generated by sbt-s3
some.bucket.name.containing.s3.s3.amazonaws.com
Anyway, your solution is good and solves my issue so thank you. :+1:
@cunei Oh, and... have a good holiday! :smile:
When the
host
contains region, sbt-s3 is broken and it results inThe host may look like this when it contains region
Bucket:
my-bucket
Region:ap-southeast-2
Host:my-bucket.s3-ap-southeast-2.amazonaws.com
For more details about valid
region/s endpoint location
, please check out http://www.bucketexplorer.com/documentation/amazon-s3--amazon-s3-buckets-and-regions.htmlI think the cause of this is because
getBucket()
method uses a hard-coded S3 URL that is ".s3.amazonaws.com" and this cannot be used for the host containing region.