Open mdeshmu opened 2 years ago
The cursor.execute
method blocks until the query execution is finished.
Cursor has a cancel method, so it can be canceled from another thread. The query ID can also be found in cursor.query_id.
https://github.com/laughingman7743/PyAthena/blob/master/tests/test_cursor.py#L474-L491
https://github.com/laughingman7743/PyAthena/blob/master/pyathena/cursor.py#L120-L124
AsyncCursor makes query execution asynchronous, so it is easy to get the query ID. However, it may be difficult to use in combination with SQLAlchemy.
https://github.com/laughingman7743/PyAthena#asynccursor
https://github.com/laughingman7743/PyAthena/blob/master/tests/test_async_cursor.py#L179-L195
Are query_id and cancel methods available with PyAthenaJDBC ?
No, it is not available; the JDBC version is also not asynchronous in query execution.
Does that mean, with PyAthena + SQL Alchemy, when a cursor is executing a query, the same cursor cannot be used to call cursor.query_id?
You can easily get the query ID from the cursor if you execute the query in a separate thread.
Following up on this @laughingman7743.
Quick context. I have a FastAPI application that facilitates data from RDS and Athena backends using asynchronous calls. Currently to do this async for Athena I use PyAthena
s Connector
and AsyncCursor
. I’d love to be able to just create another async SQL Alchemy session like I do for RDS queries but that doesn’t appear to be possible ATM.
Before spending time investigating further is an async dialect for SQL Alchemy completely impossible? If so is this from a design choice from PyAthena, a limitation within SQL Alchemy, a blocker from Athena, or some combination of the three?
AsyncCursor is a cursor that I created with a self-defined interface, not the interface as summarized in PEP 249, which makes it difficult to use SQLAlchemy. I believe SQLAlchemy only supports the DB-API of the PEP249 specification.
Hi! Running into a similar usecase, the SQLAlchemy interface has fantastic ergonomics but I'm having trouble parallelizing multiple requests to cut down on response time. Is it possible to grab the underlying query built up using SQLAlchemy and then use AsyncCursor to execute it instead? Or would that not even help since execution blocks?
I am using sqlalchemy + pyathena
Here is an example code:
from urllib.parse import quote_plus from sqlalchemy import create_engine, inspect conn_str = "awsathena+rest://@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}" engine = create_engine(conn_str.format(region_name="us-east-1", schema_name="default", s3_staging_dir=quote_plus("s3://aws-athena-query-results-bucket/"))) conn = engine.raw_connection() cursor = conn.cursor() cursor.execute("""SELECT * FROM "database"."table""")
for sqlalchemy + pyathena setup, I want to know if there is a way to get QueryExecutionId of a running query from its cursor so that it can be later used to cancel the query using cancel method.