snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
578 stars 467 forks source link

SNOW-333688: Cannot get `execute_async` working with `fetch_pandas_all` #672

Closed info-rchitect closed 1 year ago

info-rchitect commented 3 years ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using (python --version)? 3.8.5

  2. What operating system and processor architecture are you using (python -c 'import platform; print(platform.platform())')?

Windows-10-10.0.19041-SP0

  1. What are the component versions in the environment (pip freeze)?
anyio @ file:///D:/bld/anyio_1614388928981/work/dist
argon2-cffi==20.1.0
asn1crypto==1.4.0
astroid==2.4.2
async-generator==1.10
attrs==20.3.0
azure-common==1.1.26
azure-core==1.13.0
azure-storage-blob==12.8.0
azure-storage-common==2.1.0
Babel @ file:///home/conda/feedstock_root/build_artifacts/babel_1605182336601/work
backcall==0.2.0
backports.functools-lru-cache==1.6.1
bleach==3.3.0
bokeh==2.3.0
boto3==1.11.17
botocore==1.14.17
bqplot==0.12.23
brotlipy==0.7.0
certifi==2020.12.5
cffi==1.13.2
chardet==3.0.4
chart-studio==1.1.0
colorama==0.4.4
colorcet==2.0.6
cryptography==2.8
cycler==0.10.0
decorator==4.4.2
defusedxml==0.7.1
dill==0.3.3
docutils==0.15.2
entrypoints==0.3
greenlet==1.0.0
holoviews==1.14.2
hvplot==0.7.1
idna==2.8
ijson==2.6.1
importlib-metadata @ file:///D:/bld/importlib-metadata_1615778896299/work
ipykernel==5.5.0
ipypivot==0.2.7
ipython==7.21.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
isodate==0.6.0
isort==5.7.0
itables==0.3.0
jedi==0.18.0
Jinja2==2.11.3
jmespath==0.10.0
joblib==1.0.1
json5 @ file:///home/conda/feedstock_root/build_artifacts/json5_1600692310011/work
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyter-packaging @ file:///home/conda/feedstock_root/build_artifacts/jupyter-packaging_1613054948399/work
jupyter-server==1.5.0
jupyterlab @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_1616195786210/work
jupyterlab-pygments==0.1.2
jupyterlab-server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1613760084674/work
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
lazy-object-proxy==1.4.3
Mako==1.1.4
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.3.4
mccabe==0.6.1
mistune==0.8.4
mplcursors==0.4
msrest==0.6.19
multiprocess==0.70.11.1
nbclassic @ file:///home/conda/feedstock_root/build_artifacts/nbclassic_1610352513187/work
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
notebook==6.3.0
numpy==1.20.1
oauthlib==3.1.0
oscrypto==1.2.1
packaging==20.9
pandas==1.2.3
pandocfilters==1.4.3
panel==0.11.1
param==1.10.1
parso==0.8.1
pathos==0.2.7
pickleshare==0.7.5
Pillow==8.1.2
pivottablejs==0.9.0
plotly==4.14.3
pox==0.2.9
ppft==1.6.6.3
prometheus-client==0.9.0
prompt-toolkit==3.0.18
pyarrow==3.0.0
pycparser==2.20
pycryptodomex==3.9.8
pyct==0.4.8
Pygments==2.8.1
PyInquirer==1.0.3
PyJWT==1.7.1
pylint==2.6.0
pyOpenSSL==19.1.0
pyparsing==2.4.7
PyPika==0.48.0
pyrsistent==0.17.3
PySocks @ file:///D:/bld/pysocks_1610291742246/work
python-dateutil==2.8.1
pytz==2020.5
pyviz-comms==2.0.1
pywin32==300
pywinpty==0.5.7
PyYAML==5.4.1
pyzmq==22.0.3
qgrid @ file:///C:/ci/qgrid_1599278602235/work
qtconsole==5.0.3
QtPy==1.9.0
queick==1.2.1
requests==2.22.0
requests-oauthlib==1.3.0
retrying==1.3.3
s3transfer==0.3.3
scikit-learn==0.24.1
scipy==1.6.1
seaborn==0.11.1
Send2Trash==1.5.0
six==1.15.0
sklearn==0.0
sniffio @ file:///D:/bld/sniffio_1610318539925/work
snowflake-connector-python==2.4.2
snowflake-sqlalchemy==1.2.4
SQLAlchemy==1.4.5
sqlparse==0.4.1
stringcase==1.2.0
terminado==0.9.3
testpath==0.4.4
threadpoolctl==2.1.0
toml==0.10.2
tornado==6.1
tqdm==4.59.0
traitlets==5.0.5
traittypes==0.2.1
typing-extensions==3.7.4.3
urllib3==1.25.11
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
win-inet-pton @ file:///D:/bld/win_inet_pton_1610147566231/work
wincertstore==0.2
wrapt==1.12.1
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1614945704755/work
  1. What did you do?
# My SQL is a string variable that has been formatted to make it look pretty with newlines and proper indentation
 sql = sqlparse.split(sql)
 for statement in sql:
     cursor.execute_async(statement)
while connector.is_still_running(connector.get_query_status_throw_if_error(cursor.sfqid):
    print('.')
    time.sleep(1)
cursor.get_results_from_sfqid(cursor.sfqid)
df = cursor.fetch_pandas_all()
  1. What did you expect to see?

A pandas dataframe

  1. What did you see instead?
---------------------------------------------------------------------------
NotSupportedError                         Traceback (most recent call last)
C:\notebook_analytics\nemawashi\nemawashi_lib\user_interfaces\query.py in __submit(self, *args)

--> 917                 df = cursor.fetch_pandas_all()

~\.conda\envs\nemawashi\lib\site-packages\snowflake\connector\cursor.py in fetch_pandas_all(self, **kwargs)
    852         self.check_can_use_pandas()
    853         if self._query_result_format != "arrow":
--> 854             raise NotSupportedError
    855         return self._result._fetch_pandas_all(**kwargs)
    856 

NotSupportedError: Unknown error
  1. Can you set logging to DEBUG and collect the logs?
# Here is the last log output before the error above
2021-04-05 07:18:35,680 - MainThread connection.py:1263 - _get_query_status() - DEBUG - get_query_status sf_qid='019b6222-0b00-eaf9-0000-2aa90d56ca36'
DEBUG:snowflake.connector.connection:get_query_status sf_qid='019b6222-0b00-eaf9-0000-2aa90d56ca36'
2021-04-05 07:18:35,682 - MainThread network.py:1077 - _use_requests_session() - DEBUG - Active requests sessions: 1, idle: 0
DEBUG:snowflake.connector.network:Active requests sessions: 1, idle: 0
2021-04-05 07:18:35,683 - MainThread network.py:752 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
DEBUG:snowflake.connector.network:remaining request timeout: None, retry cnt: 1
2021-04-05 07:18:35,684 - MainThread network.py:733 - add_request_guid() - DEBUG - Request guid: 24449dc2-27f7-43d2-8560-914e310795a3
DEBUG:snowflake.connector.network:Request guid: 24449dc2-27f7-43d2-8560-914e310795a3
2021-04-05 07:18:35,685 - MainThread network.py:911 - _request_exec() - DEBUG - socket timeout: 60
DEBUG:snowflake.connector.network:socket timeout: 60
2021-04-05 07:18:35,884 - MainThread connectionpool.py:433 - _make_request() - DEBUG - https://mycompany01.east-us-2.azure.snowflakecomputing.com:443 "GET /monitoring/queries/019b6222-0b00-eaf9-0000-2aa90d56ca36?request_guid=24449dc2-27f7-43d2-8560-914e310795a3 HTTP/1.1" 200 None
DEBUG:snowflake.connector.vendored.urllib3.connectionpool:https://mycompany01.east-us-2.azure.snowflakecomputing.com:443 "GET /monitoring/queries/019b6222-0b00-eaf9-0000-2aa90d56ca36?request_guid=24449dc2-27f7-43d2-8560-914e310795a3 HTTP/1.1" 200 None
2021-04-05 07:18:35,926 - MainThread network.py:940 - _request_exec() - DEBUG - SUCCESS
DEBUG:snowflake.connector.network:SUCCESS
2021-04-05 07:18:35,929 - MainThread network.py:1093 - _use_requests_session() - DEBUG - Active requests sessions: 0, idle: 1
DEBUG:snowflake.connector.network:Active requests sessions: 0, idle: 1
2021-04-05 07:18:35,931 - MainThread connection.py:1263 - _get_query_status() - DEBUG - get_query_status sf_qid='019b6222-0b00-eaf9-0000-2aa90d56ca36'
DEBUG:snowflake.connector.connection:get_query_status sf_qid='019b6222-0b00-eaf9-0000-2aa90d56ca36'
2021-04-05 07:18:35,932 - MainThread network.py:1077 - _use_requests_session() - DEBUG - Active requests sessions: 1, idle: 0
DEBUG:snowflake.connector.network:Active requests sessions: 1, idle: 0
2021-04-05 07:18:35,933 - MainThread network.py:752 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
DEBUG:snowflake.connector.network:remaining request timeout: None, retry cnt: 1
2021-04-05 07:18:35,935 - MainThread network.py:733 - add_request_guid() - DEBUG - Request guid: 72994348-ad3c-45bd-874e-a55e04c78087
DEBUG:snowflake.connector.network:Request guid: 72994348-ad3c-45bd-874e-a55e04c78087
2021-04-05 07:18:35,936 - MainThread network.py:911 - _request_exec() - DEBUG - socket timeout: 60
DEBUG:snowflake.connector.network:socket timeout: 60
2021-04-05 07:18:36,135 - MainThread connectionpool.py:433 - _make_request() - DEBUG - https://mycompany01.east-us-2.azure.snowflakecomputing.com:443 "GET /monitoring/queries/019b6222-0b00-eaf9-0000-2aa90d56ca36?request_guid=72994348-ad3c-45bd-874e-a55e04c78087 HTTP/1.1" 200 None
DEBUG:snowflake.connector.vendored.urllib3.connectionpool:https://mycompany01.east-us-2.azure.snowflakecomputing.com:443 "GET /monitoring/queries/019b6222-0b00-eaf9-0000-2aa90d56ca36?request_guid=72994348-ad3c-45bd-874e-a55e04c78087 HTTP/1.1" 200 None
2021-04-05 07:18:36,138 - MainThread network.py:940 - _request_exec() - DEBUG - SUCCESS
DEBUG:snowflake.connector.network:SUCCESS
2021-04-05 07:18:36,139 - MainThread network.py:1093 - _use_requests_session() - DEBUG - Active requests sessions: 0, idle: 1
DEBUG:snowflake.connector.network:Active requests sessions: 0, idle: 1

Before I moved to execute_async I used this code to create my dataframe:

sql = sqlparse.split(arg)
for statement in sql:
    cs = cursor.execute(statement)
    rs = cs.fetchall()
df = pd.DataFrame.from_records(rs, columns=[x[0] for x in cs.description], coerce_float=True)

However I cannot use the from_records function anymore because cs is a dict versus an object that has a description attribute.

info-rchitect commented 3 years ago

BTW, I found a workaround to make the asynchronous queries work:

# This code is done in my Snowflake class
cursor.execute_async(statement)
job_id = cursor.sfqid

# The rest of the code is called within a function that is the target of a thread
# that is not joined so it does not block
polling2.poll(
            lambda: connector.is_still_running(connector.get_query_status_throw_if_error(job_id)) == False,
            step=5,
            poll_forever=True
)
cursor.query_result(job_id) # This is the workaround.  By putting this after a threaded polling function, it ensures that the results are available.  The `query_result` function does not guarantee that like the `get_results_from_sfqid` function does
df = cursor.fetch_pandas_all()  
github-actions[bot] commented 1 year ago

To clean up and re-prioritize bugs and feature requests we are closing all issues older than 6 months as of March 1, 2023. If there are any issues or feature requests that you would like us to address, please re-create them. For urgent issues, opening a support case with this link Snowflake Community is the fastest way to get a response