laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
463 stars 105 forks source link

how to get QueryExecutionId from cursor when using sqlalchemy + pyathena #339

Open mdeshmu opened 2 years ago

mdeshmu commented 2 years ago

I am using sqlalchemy + pyathena

Here is an example code:

from urllib.parse import quote_plus from sqlalchemy import create_engine, inspect conn_str = "awsathena+rest://@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}" engine = create_engine(conn_str.format(region_name="us-east-1", schema_name="default", s3_staging_dir=quote_plus("s3://aws-athena-query-results-bucket/"))) conn = engine.raw_connection() cursor = conn.cursor() cursor.execute("""SELECT * FROM "database"."table""")

for sqlalchemy + pyathena setup, I want to know if there is a way to get QueryExecutionId of a running query from its cursor so that it can be later used to cancel the query using cancel method.

laughingman7743 commented 2 years ago

The cursor.execute method blocks until the query execution is finished. Cursor has a cancel method, so it can be canceled from another thread. The query ID can also be found in cursor.query_id. https://github.com/laughingman7743/PyAthena/blob/master/tests/test_cursor.py#L474-L491 https://github.com/laughingman7743/PyAthena/blob/master/pyathena/cursor.py#L120-L124 AsyncCursor makes query execution asynchronous, so it is easy to get the query ID. However, it may be difficult to use in combination with SQLAlchemy. https://github.com/laughingman7743/PyAthena#asynccursor https://github.com/laughingman7743/PyAthena/blob/master/tests/test_async_cursor.py#L179-L195

mdeshmu commented 2 years ago

Are query_id and cancel methods available with PyAthenaJDBC ?

laughingman7743 commented 2 years ago

No, it is not available; the JDBC version is also not asynchronous in query execution.

mdeshmu commented 2 years ago

Does that mean, with PyAthena + SQL Alchemy, when a cursor is executing a query, the same cursor cannot be used to call cursor.query_id?

laughingman7743 commented 2 years ago

You can easily get the query ID from the cursor if you execute the query in a separate thread.

jack-miller-xc commented 10 months ago

Following up on this @laughingman7743.

Quick context. I have a FastAPI application that facilitates data from RDS and Athena backends using asynchronous calls. Currently to do this async for Athena I use PyAthenas Connector and AsyncCursor. I’d love to be able to just create another async SQL Alchemy session like I do for RDS queries but that doesn’t appear to be possible ATM.

Before spending time investigating further is an async dialect for SQL Alchemy completely impossible? If so is this from a design choice from PyAthena, a limitation within SQL Alchemy, a blocker from Athena, or some combination of the three?

laughingman7743 commented 10 months ago

AsyncCursor is a cursor that I created with a self-defined interface, not the interface as summarized in PEP 249, which makes it difficult to use SQLAlchemy. I believe SQLAlchemy only supports the DB-API of the PEP249 specification.

paavanb commented 7 months ago

Hi! Running into a similar usecase, the SQLAlchemy interface has fantastic ergonomics but I'm having trouble parallelizing multiple requests to cut down on response time. Is it possible to grab the underlying query built up using SQLAlchemy and then use AsyncCursor to execute it instead? Or would that not even help since execution blocks?