Closed lostmygithubaccount closed 2 months ago
possibly could build off of https://github.com/laughingman7743/PyAthena? I remember suggesting this a long time ago but there were concerns on how it would be able to be tested without you know, having an AWS account in CI ect
This feature would be valuable to me too. It'd probably be good to reuse functionality that's already common and built out in other AWS maintained packages.
For example, the AWS SDK for Pandas uses boto3 sessions for authentication. The authentication there includes a default session which will is a nice feature to connect to AWS so long as the environment is configured to work with other AWS tools like their AWS CLI. I tend to rely on the priority search authentication in there to autoload from a credentials/config file made with the AWS CLI to refresh any session tokens, but I know others may prefer refreshing standardized environment variables for AWS authentication instead.
One other pro for using this approach is that the config/credentials files used by boto3 sessions are also what pyarrow implemented for it's authentication into AWS and reading/writing parquet files to S3. So this may work nicely with the to_parquet/read_parquet and s3 file systems as well. Similarly it's what PyAthena mentioned above also uses. In practice this is also just nice to work with in my experience - get the aws authentication working once, and then I can use the same configs for multiple packages (AWS SDK for Pandas, PyArrow, boto3, PyAthena, etc)
Separate from the authentication topic...the AWS SDK for Pandas might make for a good SQL backend for Athena as well, as it implements the standard SQL operators directly in the Athena and Glue services. Likely that'd mean that the Ibis connection object would need to cover some config options, with the main one being different approaches in how to handle getting data from AWS back to the Python session that have a big impact on performance. But if all we need is authentication, then the SQL dialects in Trino (what Athena is based on) ought to get us pretty close too.
Hope the references above are helpful if this gets picked up, thanks!
Agreed that we should work towards making it easier to add support for backends that are ostensibly derivative of existing systems.
It's very likely that we won't get to this until after #7580 (or a sequence of its changes) are merged and released, as we'd like to get away from sqlalchemy before supporting more backends.
If someone wants to try handing a pyathena
DB-API connection object to the trino.from_connection
constructor and see if that's tractable, we can look at what else might be required to make this work. The docs on PyAthena reference dumping query results as CSV to a bucket and then downloading that CSV -- if that's the pattern, we would probably hold off on adding this until there is proper ADBC support.
Consider how to support generic AWS authentication and backends for services, namely but not limited to Athena
Hi @lostmygithubaccount ,
Thanks for the reply and sorry for the delayed response here, I was bit occupied with other work so couldn't able spend time on this.
I had a look at the postgresql backends, but wondering about making a connection to athena using postgresql. At the moment All I have is aws credentials like AccessKeyId and SecretAccessKey. I am not sure how to pass these in the args.
If possible, could you please post a sample code snippet to make a connection to aws athena using postgresql backend ?
Originally posted by @uramith in https://github.com/ibis-project/ibis/discussions/7229#discussioncomment-7649716