Open BenKanouse opened 1 week ago
Thank you for this, and for the good description!
I'm sorry I've been very busy in other areas last week and this, but this is on my radar. I will try to look at it today, if not, know that I haven't forgotten you, but if you don't hear from me for a bit, please feel free to ping again on this issue!
OK, so host
-- you are right that when I implemented/documented it I hadn't totally thought it ought and it didn't quite work in that way.
But I have ended up using it for when you have a CloudFront distribution (or any other CDN) on top of S3. For public unsigned urls, you just need to change the hostname to be your CDN hostname. And this is a convenient tool to do it, in a speedy way (much speedier than AWS SDK), applying appropriate (a bit idiosyncratic) S3 escaping rules for keys, etc.
So I think we do need to keep the host
argument there, working as it did -- at least for unsigned URLs. Perhaps host and endpoint are mutually exclusive, and if you specify both you get an ArgumentException? Or we just doc which takes priority.
OK thanks for this!
Splitting it out into a separate class is nice and readable code organization wise. (I'd probably call the class BaseURI instead of just URI).
BUT. The whole point of this thing is performance. We need to make sure splitting it into a different class, or these changes in general, don't harm performance significanty -- especially in cases where this new feature is NOT being used.
Would you be interested in trying to run the in-repo benchmarking stuff against this new one -- maybe find a way to run the old unchanged code against the new code -- to see how it effects performance?
Thank you!
Also add a CHANGELOG.md entry in PR please, with a URL to this PR!
Thanks for getting back to me! I can get back to this early next week.
It sounds like I have the following TODOs:
I am still a little confusing about the host
feature. Does that mean when you use it you do specify the bucket name in the host
string you provide? If so that makes a lot of sense, and I could update the specs to match that pattern.
Is the difference between endpoint and host just that endpoint allows you to include a port too? I am having trouble understanding what's up here -- can you give me illustrative examples of what you would pass in as endpoint and where you would get it?
Hey if I understand what endpoint is doing, I think this could and shoudl actually be a lot simpler.
Endpoint is just to allow you to specify http
instead of https
, and a port in addition to a host?
This code can be much simpler. If Endpoint is specified, just use the string given to construct the url, no?
Places in the code that have "https://" + self.host + path
just need to be changed to have logic like:
if self.endpoint
self.endpoint + path
else
"https://" + self.host + path
end
Even better, if on initialization a host is given and not an endpoint, just SET the endpoint to "https://#{host}", and then all subsequent logic can just be in terms of endpoint -- host
is just a shortcut for setting endpoint
that assumes https
and no port.
I think this can just be a few lines of change, and doesn't need a separate class, or instantiating a stdlib URI
object -- things I was worried about their effects on performance.
My use of host
. I have an S3 Bucket. So something in the S3 bucket might be at the public URL "https://#my-bucket.s3.amazonaws.com/somepath/something.jpg". No signature, it's a public URL.
But i have an AWS Cloudfront CDN in front of it, so to access it through AWS Cloudfront I actually need to access it at https://d987adfa07df.cloudfront.net/somepath/something.jpg
, so I set host to d987adfa07df.cloudfront.net
.
I started down that expect path, and I ran into 2 problems.
canonical_headers
needs to not include the http
or https
. So when I have an endpoint
param I need to be able to strip the scheme
off of it.Both the path
and the canonical_headers
are in the string_to_sign
so it is important that those variables are exactly correct for the signature. So it makes it all it little more complicated than just changing the resulting url.
I do think you have a great idea about using the host to set an endpoint, which might simplify the logic! I didn't fully understand that host variable before, I can look into this next week. Thanks!
https://github.com/jrochkind/faster_s3_url/issues/9
Currently the
Builder
's initializer accepts ahost
param which assumes https. I like to use minio for local development and it isn't ideal to run it withhttps
.The normal Aws::S3::Client gem accepts an
:endpoint
option to by-pass this issue.Here is a section of the docs that explains this option:
https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Client.html