Closed Kache closed 1 month ago
hi - thanks for submitting this issue and especially for the reproduction. taking a look .
i can confirm the async query gets cancelled anyways after 5 minutes even though the additional 'heartbeat' queries happening every 2 minutes. Apparently, a single query will not be recognized as a heartbeat. We have a private function to send request to snowflake to keep the connection alive (this is happening when client_session_keep_alive is set) - will check how it can be utilized in this scenario to properly send requests to /session/heartbeat
edit: replaced the SELECT 'heartbeat'
with conn.rest._heartbeat()
which , confirmed from the logs, correctly reaches out to the heartbeat endpoint but still did not prevent the SQL from being canceled after 5 minutes.
We need to investigate further.
tested various methods to no avail, until my colleague @sfc-gh-hachouraria figured out that sending a request to /queries/<queryId>/results
allows the session to continue past the 5 minute mark.
This is done by issuing cur.query_result(query_id)
; tested it with your repro program and indeed these calls to query_result
, instead issuing the heartbeat calls, allowed the detached async query to live more than 5 minutes even with ABORT_DETACHED_QUERY=TRUE
and run to completion
Of course there aren't really any query results in the query_result
until the query actually finishes, so probably needs to be wrapped in some logic when used as an alternative 'heartbeat' to prevent detached async query from being aborted, but perhaps could help in your situation.
Can you please try it and let us know if it works for you ?
Switching out cur.execute("SELECT 'heartbeat'")
for cur.query_result(query_id)
alone didn't work for me. I observed:
KeyError
in _init_result_and_meta()
when ret.get('data')['rowtype']
because endpoint response looks like:
{
'code': '333333',
'data': {
'getResultUrl': '/queries/01b6f149-030a-5b62-76fd-87016f53f77f/result',
'progressDesc': None,
'queryAbortsAfterSecs': 300,
'queryId': '01b6f149-030a-5b62-76fd-87016f53f77f',
},
'message': 'Query execution in progress. Query will be aborted unless request for result is received within 300 seconds.',
'success': True,
}
In addition:
not ret.get("success")
case will raise ProgrammingError
with the message in the responseusing as the heartbeat: cur.connection.rest.request(url=f"/queries/{query_id}/result", method="get")
(i.e. cur.query_result(query_id)
without response handling) at first try does seem to work
I'll keep experimenting a bit more
Since the endpoint takes 30-50 sec to respond, I'm thinking of running it in a thread with daemon=True
to "fire and forget" the heartbeat, rather than be blocked for an unused response
Just a little worried about potential bad resource cleanup though
Hi, we have confirmed that when ABORT_DETACHED_QUERY
is set to True, the only way to keep it alive is to call cur.query_result(query_id)
to fetch the result of that query. But this will change the query into a sync query, which will not affected by the parameter. That being said, no async query can be always kept alive when ABORT_DETACHED_QUERY
is set to True.
In the meantime, we will update our doc to make it more clear of the usage of ABORT_DETACHED_QUERY
.
If you do want a async query to keep alive even if ABORT_DETACHED_QUERY
is set to True, please file a ticket to us. Thank you!
In my testing, that GET will return early if the query completes and otherwise block Snowflake-side for up to ~40sec before returning "Query execution in progress".
This means that overall the query does remains async, and this is a valid (but clunky) workaround.
I would like to be able to keep async queries alive even if ABORT_DETACHED_QUERY is set to True, and there are a handful of possible options:
With zero Snowflake support:
cur.connection.rest.request()
to hit the "semi-blocking heartbeat" using alternative non-blocking HTTP clientVarious things Snowflake could support:
Officially provide the non-blocking HTTP heartbeat request in the Snowflake connector python client, e.g. conn.heartbeat()
which (or any other) option can be considered?
Hi @Kache, Both of the way you mentioned looks good to me, I'll file a ticket to server side and let them decide what they should do. This is going to be a server side change that snowpark have no control. Thanks for your advice!
Python version
Python 3.11.4 (main, Jan 10 2024, 15:34:31) [Clang 15.0.0 (clang-1500.1.0.2.5)]
Operating system and processor architecture
Linux-6.10.6-orbstack-00249-g92ad2848917c-x86_64-with-debian-10.13
Installed packages
What did you do?
I want to log the query id without needing to wait for the query to have completed.
Starting from sync usage:
Attempt: use async
However, the above only works for fast queries. We use
ABORT_DETACHED_QUERY = TRUE
because we want queries to passively stop if/when the tasks/processes driving the original query stop/terminate, so queries > 5 mins get canceled:Although docs says:
It also says:
Sounds like just need a way to either keep the connection alive or re-connect.
Attempt: use async,
ABORT_DETACHED_QUERY
, and active heartbeat queries to "keep alive"However, the query gets cancelled all the same in 5 minutes.
Using a "re-connect" strategy by using a new
connection
every couple minutes (rather than using the sameconnection
) is similarly ineffective.What did you expect to see?
Expected to be able to prevent an async query from being canceled even though
ABORT_DETACHED_QUERY = TRUE
by either actively keeping the connection alive or by actively re-connecting.Keeping
ABORT_DETACHED_QUERY = TRUE
is desirable because we want queries to passively stop if/when the tasks/processes driving the original query stop/terminate.Can you set logging to DEBUG and collect the logs?
n/a