michalc / sqlite-s3-query

Python functions to query SQLite files stored on S3
MIT License
251 stars 15 forks source link

allow boto3 as a HTTP data (range) provider #11

Closed candrsn closed 3 years ago

candrsn commented 3 years ago

If the httpx client to S3 compliant bucket as abstracted just a bit and its own class, AWS boto3 could be swapped in when needed.

I am starting working on such an abstraction in the repo / branch https://github.com/candrsn/sqlite-s3-query/tree/feature/boto3_client

It is not ready for a merge, but I want to start discussions before I a am ready for a PR.

The abstraction would move the current S3 API elements into a class that had minimal symmetry with the boto3 S3 client API x = boto3.client('S3')

and also

session = boto3.Session() s3 = session.client('s3')

That would simplifiy S3 API access methods

michalc commented 3 years ago

I'm very tempted to not make it so boto3 cannot be swapped in via an abstraction layer. This is a fairly low level library (at least as far as a lot of Python libraries go I think?), with what I hope is a minimal set of dependencies, and I would like to keep it that way for flexibility and performance reasons. I know it's very typical for boto3 to be required for Python code to talk to AWS, but for S3 (or an S3-compatible service), I think it's fairly unnecessary.

(I've even been thinking even about how to remove the http client as a dependency... but realistically that won't happen, at least any time soon)

But... if you really want to use this library with boto3 right now, you probably can. You just need to present an httpx-like interface via get_http_client, and convert

michalc commented 3 years ago

After thinking a bit more, I'm happy to not add any deliberate support for boto3, and just have the abstraction be at the http(s) level.