SNOW-1556467: Querying table works from Snowflake UI, but _fetch_pandas_all hangs and crashes kernel

giacomo-mason commented 1 month ago

Python version

Python 3.10.2 (main, Apr 4 2022, 11:53:00) [Clang 13.1.6 (clang-1316.0.21.2)]

Operating system and processor architecture

macOS-14.5-arm64-arm-64bit

Installed packages

anyio==4.4.0
appdirs==1.4.4
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asn1crypto==1.5.1
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
autopep8==1.6.0
Babel==2.15.0
backcall==0.2.0
beautifulsoup4==4.12.3
black==22.12.0
bleach==6.1.0
bytecode==0.15.1
cattrs==23.2.3
certifi==2024.7.4
cffi==1.16.0
cfgv==3.4.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
comm==0.2.2
contourpy==1.1.1
cryptography==42.0.8
cycler==0.12.1
Cython==3.0.0a10
ddsketch==3.0.1
ddtrace==1.19.0
debugpy==1.8.2
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
diff-cover==7.7.0
distlib==0.3.8
envier==0.5.1
exceptiongroup==1.2.2
executing==2.0.1
fastjsonschema==2.20.0
filelock==3.15.4
flake8==4.0.1
fonttools==4.53.1
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
identify==2.6.0
idna==3.7
importlib_metadata==8.1.0
iniconfig==2.0.0
ipykernel==6.29.5
ipython==8.12.3
ipywidgets==8.1.3
isoduration==20.11.0
isort==5.13.2
jaraco.classes==3.4.0
jedi==0.19.1
Jinja2==3.1.4
joblib==1.4.2
json5==0.9.25
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.4
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.11
keyring==24.3.1
kiwisolver==1.4.5
llvmlite==0.41.1
MarkupSafe==2.1.5
matplotlib==3.7.5
matplotlib-inline==0.1.7
mccabe==0.6.1
mistune==3.0.2
more-itertools==10.3.0
mypy-extensions==1.0.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nbqa==1.8.5
nest-asyncio==1.6.0
nodeenv==1.9.1
notebook==7.2.1
notebook_shim==0.2.4
numba==0.58.1
numexpr==2.10.0
numpy==1.24.4
opentelemetry-api==1.16.0
overrides==7.7.0
packaging==24.1
pandas==2.0.3
pandocfilters==1.5.1
parso==0.8.4
pathspec==0.12.1
patsy==0.5.6
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.4.0
platformdirs==4.2.2
pluggy==1.5.0
pre-commit==3.5.0
prometheus_client==0.20.0
prompt_toolkit==3.0.47
protobuf==5.27.0
psutil==6.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==17.0.0
pycodestyle==2.8.0
pycparser==2.22
pyflakes==2.4.0
Pygments==2.18.0
PyJWT==2.8.0
pyOpenSSL==24.2.1
pyparsing==3.1.2
pytest==8.3.1
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.3
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.19.0
scikit-learn==1.3.2
scipy==1.9.3
seaborn==0.13.2
Send2Trash==1.8.3
shap==0.44.1
six==1.16.0
slicer==0.0.7
sniffio==1.3.1
snowflake-connector-python==3.12.0
sortedcontainers==2.4.0
soupsieve==2.5
sqlfluff==2.3.5
stack-data==0.6.3
statsmodels==0.14.1
tabulate==0.9.0
tblib==3.0.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.3.0
tokenize-rt==5.2.0
toml==0.10.2
tomli==2.0.1
tomlkit==0.13.0
tornado==6.4.1
tqdm==4.66.4
traitlets==5.14.3
types-python-dateutil==2.9.0.20240316
typing_extensions==4.12.2
tzdata==2024.1
uri-template==1.3.0
urllib3==1.26.19
virtualenv==20.26.3
wcwidth==0.2.13
webcolors==24.6.0
webencodings==0.5.1
websocket-client==1.8.0
widgetsnbextension==4.0.11
wrapt==1.16.0
xgboost==2.1.0
xmltodict==0.13.0
zipp==3.19.2

What did you do?

import snowflake.connector
import logging
import pandas as pd

import logging
import os

for logger_name in ('snowflake.connector',):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

# Connect to Snowflake
conn = snowflake.connector.connect(
    user="***",
    password='',
    account='***',
    warehouse='***',
    authenticator='externalbrowser'
)

# Create a cursor object
cursor = conn.cursor()

# Execute the query
cursor.execute("SELECT * FROM my_table")

# Fetch all the results into a pandas DataFrame
df = cursor.fetch_pandas_all()

# Close the cursor and connection
cursor.close()
conn.close()

# Print the DataFrame
df.head()

What did you expect to see?

I have a relatively large Snowflake table (4,155,216 rows and 177 columns).

I want to pull the entire table into a Pandas dataframe.

From the Snowflake UI, I can successfully do

SELECT * FROM my_table;

When running the same query from a Jupyter notebook (see above), I was expecting the df to contain the data from the table.

Instead, the script runs for a bit and then hangs. The Python kernel dies and needs to be restarted.

I get the same error with anything but the smallest sample from that table. For example, a LIMIT 1000 works fine, but a LIMIT 10000 runs into the same issue.

I attach the debug logs from the code above (without the initial part of the logs to remove confidential information).

connector_logs2.txt

Can you set logging to DEBUG and collect the logs?

import logging
import os

for logger_name in ('snowflake.connector',):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

sfc-gh-yixie commented 1 month ago

@giacomo-mason did this happens in a Python Stored Proc?

giacomo-mason commented 1 month ago

No, this happened just using the connector to pull data from snowflake into memory to use locally in a jupyter notebook

snowflakedb / snowflake-connector-python