greghendershott / aws

Racket support for Amazon Web Services.
BSD 2-Clause "Simplified" License
78 stars 25 forks source link

Add anonymous access to s3 storage #65

Open Nyanraltotlapun opened 4 years ago

Nyanraltotlapun commented 4 years ago

Perhaps I am missing something, but I cannot find a way to query public bucket

Amazone states that "Every interaction with Amazon S3 is either authenticated or anonymous" https://docs.aws.amazon.com/en_us/AmazonS3/latest/dev/MakingRequests.html

greghendershott commented 4 years ago

When a bucket is configured to allow anonymous public access, I think the intent is for the usual plain old HTTP HEAD or GET requests to work, from the usual generic tools like curl, wget, or a web browser.

In that case -- where an Authorization header with an Amazon v4 signature is not required -- things from aws/s3 like ls and get/bytes don't add much value compared to simply using net/url or net/http-client.

As a result, I don't think it even occurred to me to support the non-authenticated scenario.

It is an interesting enhancement idea. I'll tag it with that label.

(I'm not sure how many people would use it... so I'm not sure it's worth the time or the risk of breaking something where authentication is desired.. and I'm not sure when/if this might get added. So in the meantime if you need to do this I'd suggest using net/url or net/http-client or similar.)

Nyanraltotlapun commented 4 years ago

Thank you.

It is logical to use object database from corresponding program api, parsing this things may seams easy, but writing such thing over and over again introduces errors and require time that can be spend elsewhere. This is why I (and I think lots of other people) really appreciate libraries like this.

I do not look inside lib just yet, but I can suggest to implement this by adding (credentials-anonymous!) method.

greghendershott commented 4 years ago

Do you have an example public bucket in mind? If so, have you gone ahead and tried to use the library? (I don't and haven't.)

I ask because maybe it already works?! The private and public keys default to "". So if you don't call any credentials-from-XXX function, probably an Authorization header with sigv4 using those values will be added. And maybe S3 will ignore that header when the bucket allows public/anon access?

[Even if that happens to work, it would be good to document that and preserve that behavior going forward. i.e. I'm asking you can help by doing a quick experiment -- not proposing this as the final answer.]

greghendershott commented 4 years ago

Oh never mind. I forgot. The code has many ensure-have-keys calls that check for the keys being "" and error -- to help users understand the situation where things won't work because they haven't set the keys.

That's the kind of thing I was talking about initially. The package currently assumes authenticated, and uses that assumption to e.g. provide helpful error message. Of course it would be possible to somehow preserve that and also support anonymous. It's not rocket science. It's "only" time to do it, update docs, find and fix resulting bugs.

greghendershott commented 4 years ago

I did a few quick hacks to experiment with not supplying any Authorization header at all, when the public or private keys are blank.

It works, but the only thing that S3 allows anonymously seems to be "getter" functions like get/bytes. Not "listers" like ls. (And obviously not "putters".)


If you know a bucket name and object path, forming the URI is simply:

(string-append "https://" bucket "." endpoint "/" path)

where endpoint is e.g. "s3.amazonaws.com" or maybe a specific location endpoint.

And you give that URI to net/url or net/http-client or curl or wget or whatever and... that's all you need to do.

People requiring this whole AWS package just for that string-append seems like... not the best use case to spend time supporting?

greghendershott commented 4 years ago

p.s. If you think it would be helpful, I'd be happy to add to the documentation something like: "Tip: If you only need to get things from a public S3 bucket, then you don't need this package; instead simply "?

Nyanraltotlapun commented 4 years ago

Do you have an example public bucket in mind? If so, have you gone ahead and tried to use the library? (I don't and haven't.)

https://s3-eu-west-1.amazonaws.com/public.bitmex.com?delimiter=/&prefix=data/

I ask because maybe it already works?!

I tried:

#lang racket
(require aws/s3)

(s3-host "s3.eu-west-1.amazonaws.com")
(s3-region "eu-west-1")
(s3-scheme "https")
(ls "public.bitmex.com/data/trade")

This gives me:

open-input-file: cannot open input file
path: /home/test/.aws/credentials
pschmied commented 2 years ago

If it is useful, there are petabytes of open data in various formats in open S3 buckets listed at https://registry.opendata.aws/