laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
463 stars 105 forks source link

Add Endpoint_URL param to SQLAlchemy #495

Closed Xl1001 closed 10 months ago

Xl1001 commented 11 months ago

Working with this library and the SQL Alchemy version I have found that it is not possible to make the connection without the private DNS format, which limits the use of the library. On the other hand, the base version and boto3 do allow to work with the VPCE FQDN.

Private DNS format: athena.{region_name}.amazonaws.com

VPCE format: {VPC_Endpoint_ID}.athena.{region_name}.vpce.amazonaws.com

If you try to use the fromato of VPC interface endpoint it indicates wrong region but the endpoint is correct.

laughingman7743 commented 11 months ago

https://github.com/laughingman7743/PyAthena/blob/master/pyathena/sqlalchemy/base.py#L996-L1006 It would need to be fixed to be able to handle the case of VPCE.

laughingman7743 commented 10 months ago

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.client https://github.com/laughingman7743/PyAthena/blob/master/pyathena/connection.py#L44 https://github.com/laughingman7743/PyAthena/blob/master/pyathena/connection.py#L155-L157 The SQLAlchemy URL is not associated with a VPC endpoint URL. You should be able to use VPCE by specifying the VPCE URL in the parameter endpoint_url in the SQLAlchemy URL as follows:

conn_str = "awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com:443/"\
           "{schema_name}?s3_staging_dir={s3_staging_dir}&endpoint_url={endpoint_url}"
engine = create_engine(conn_str.format(
    aws_access_key_id="YOUR_ACCESS_KEY_ID",
    aws_secret_access_key="YOUR_SECRET_ACCESS_KEY",
    region_name="us-west-2",
    schema_name="default",
    s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
    endpoint_url="VPC_ENDPOINT_ID.athena.us-west-2.vpce.amazonaws.com"))