snowflakedb / snowflake-sqlalchemy

Snowflake SQLAlchemy
https://pypi.python.org/pypi/snowflake-sqlalchemy/
Apache License 2.0
234 stars 152 forks source link

SNOW-1726519: Full schema introspection breaks writing if unsupported datatype exists in the schema #534

Open fmcardoso opened 1 month ago

fmcardoso commented 1 month ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.10.9

  2. What operating system and processor architecture are you using?

    macOS-14.6.1-arm64-arm-64bit

  3. What are the component versions in the environment (pip freeze)?

aiobotocore==2.4.2
aiohttp==3.9.5
aioitertools==0.11.0
aiosignal==1.3.1
alabaster==0.7.16
alembic==1.13.1
altair==5.3.0
aniso8601==9.0.1
annotated-types==0.7.0
anyio==4.4.0
appdirs==1.4.4
appnope==0.1.4
argcomplete==3.1.6
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asn1crypto==1.5.1
asttokens==2.4.1
astunparse==1.6.3
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
Babel==2.15.0
backoff==2.2.1
bcrypt==4.1.3
beautifulsoup4==4.12.3
bleach==6.1.0
blinker==1.8.2
bokeh==2.4.3
boto3==1.24.59
botocore==1.27.59
build==1.2.2
CacheControl==0.14.0
cachetools==5.3.3
category-encoders==2.6.3
certifi==2024.8.30
cffi==1.16.0
cfgv==3.4.0
charset-normalizer==3.3.2
cleo==2.1.0
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
comm==0.2.2
commitizen==3.9.1
contourpy==1.2.1
coverage==7.5.3
crashtest==0.4.1
cryptography==43.0.1
cycler==0.12.1
debugpy==1.8.1
decli==0.6.2
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
deprecation==2.1.0
dill==0.3.8
distlib==0.3.8
distro==1.9.0
docker==7.1.0
docutils==0.21.2
duckdb==0.10.0
duckdb_engine==0.11.1
dulwich==0.21.7
dynaconf==3.2.6
entrypoints==0.4
evidently==0.4.37
exceptiongroup==1.2.1
execnb==0.1.6
execnet==2.1.1
executing==2.0.1
Faker==28.4.1
fastcore==1.5.46
fastjsonschema==2.20.0
filelock==3.15.1
Flask==3.0.3
fonttools==4.53.0
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2022.11.0
future==1.0.0
ghapi==1.0.5
ghp-import==2.1.0
gitdb==4.0.11
GitPython==3.1.43
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
griffe==0.46.0
gunicorn==22.0.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.17.3
hyperopt==0.2.7
hypothesis==6.103.2
hypothesis-jsonschema==0.23.1
identify==2.5.36
idna==3.7
imagesize==1.4.1
imbalanced-learn==0.12.3
imblearn==0.0
importlib-metadata==6.11.0
iniconfig==2.0.0
installer==0.7.0
ipykernel==6.29.4
ipython==8.25.0
ipython-genutils==0.2.0
ipython-sql==0.4.1
ipywidgets==8.1.3
isoduration==20.11.0
iterative-telemetry==0.0.8
itsdangerous==2.2.0
jaraco.classes==3.4.0
jedi==0.19.1
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
json5==0.9.25
jsonpointer==3.0.0
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-cache==1.0.0
jupyter-console==6.6.3
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.14.1
jupyter_server_terminals==0.5.3
jupyterlab==4.2.2
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.2
jupyterlab_widgets==3.0.11
jupytext==1.16.2
keyring==24.3.1
kiwisolver==1.4.5
litestar==2.11.0
llvmlite==0.43.0
loguru==0.7.2
lxml==5.2.2
Mako==1.3.5
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
mdit-py-plugins==0.4.1
mdurl==0.1.2
mergedeep==1.3.4
mistune==3.0.2
mkdocs==1.6.0
mkdocs-autorefs==1.0.1
mkdocs-gen-files==0.5.0
mkdocs-get-deps==0.2.0
mkdocs-git-revision-date-localized-plugin==1.2.6
mkdocs-jupyter==0.24.7
mkdocs-material==9.5.27
mkdocs-material-extensions==1.3.1
mkdocstrings==0.25.1
mkdocstrings-python==1.10.3
mlflow==2.14.0
mlflow-skinny==2.14.0
more-itertools==10.5.0
moto==4.2.14
mpmath==1.3.0
msgpack==1.0.8
msgspec==0.18.6
multidict==6.0.5
multimethod==1.10
mypy-extensions==1.0.0
myst-nb==1.1.0
myst-parser==3.0.1
nbclient==0.10.0
nbconvert==7.16.4
nbdev==2.3.25
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.3
nltk==3.9.1
nodeenv==1.9.1
notebook==7.2.1
notebook_shim==0.2.4
numba==0.60.0
numpy==1.26.4
opentelemetry-api==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-semantic-conventions==0.46b0
overrides==7.7.0
packaging==24.1
paginate==0.5.6
pandarallel==1.6.5
pandas==1.5.3
pandas-datareader==0.10.0
pandas-stubs==2.2.2.240603
pandera==0.19.3
pandocfilters==1.5.1
parso==0.8.4
pathspec==0.12.1
patsy==0.5.6
pexpect==4.9.0
pillow==10.3.0
pkginfo==1.11.1
platformdirs==4.2.2
plotly==5.18.0
pluggy==1.5.0
poetry==1.8.3
poetry-core==1.9.0
poetry-plugin-export==1.8.0
polyfactory==2.16.2
pprintpp==0.4.0
pre-commit==3.7.1
prettytable==0.7.2
prometheus_client==0.20.0
prompt_toolkit==3.0.47
protobuf==4.25.3
psutil==5.9.8
psycopg2-binary==2.9.9
ptyprocess==0.7.0
pure-eval==0.2.2
py-partiql-parser==0.5.0
py4j==0.10.9.7
pyarrow==14.0.2
pycountry==24.6.1
pycountry-convert==0.7.2
pycparser==2.22
pydantic==2.7.4
pydantic-settings==2.3.3
pydantic_core==2.18.4
pydeck==0.9.1
Pygments==2.18.0
PyJWT==2.8.0
pymdown-extensions==10.8.1
pyOpenSSL==24.1.0
pyparsing==3.1.2
pyproject_hooks==1.1.0
pyprojroot==0.3.0
pytest==7.4.4
pytest-cov==4.1.0
pytest-mock==3.14.0
pytest-sugar==1.0.0
pytest-xdist==3.6.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-gitlab==4.6.0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyyaml_env_tag==0.1
pyzmq==26.0.3
qtconsole==5.5.2
QtPy==2.4.1
querystring-parser==1.2.4
questionary==1.10.0
rapidfuzz==3.9.7
ray==2.20.0
referencing==0.35.1
regex==2024.5.15
repoze.lru==0.7
requests==2.32.3
requests-toolbelt==1.0.0
responses==0.25.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rich-click==1.8.3
rpds-py==0.18.1
ruff==0.4.9
s3fs==2022.11.0
s3transfer==0.6.2
safetensors==0.4.3
scikit-learn==1.2.2
scipy==1.13.1
seaborn==0.13.2
Send2Trash==1.8.3
sentence-transformers==2.7.0
setuptools-scm==8.1.0
shap==0.42.1
shellingham==1.5.4
six==1.16.0
slicer==0.0.7
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.1
snowballstemmer==2.2.0
snowflake-connector-python==3.10.1
snowflake-sqlalchemy==1.5.3
sortedcontainers==2.4.0
soupsieve==2.5
Sphinx==7.3.7
sphinxcontrib-applehelp==1.0.8
sphinxcontrib-devhelp==1.0.6
sphinxcontrib-htmlhelp==2.0.5
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.7
sphinxcontrib-serializinghtml==1.1.10
SQLAlchemy==1.4.52
sqlparse==0.5.0
stack-data==0.6.3
statsmodels==0.14.2
streamlit==1.35.0
sympy==1.12.1
tabulate==0.9.0
tenacity==8.4.2
tensorboardX==2.6.2.2
termcolor==2.4.0
terminado==0.18.1
testcontainers==3.7.1
threadpoolctl==3.5.0
tinycss2==1.3.0
tokenizers==0.15.2
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.5
toolz==0.12.1
torch==2.3.1
tornado==6.4.1
tqdm==4.66.4
traitlets==5.14.3
transformers==4.35.2
trove-classifiers==2024.7.2
typeguard==4.3.0
typer==0.12.5
types-python-dateutil==2.9.0.20240316
types-pytz==2024.1.0.20240417
types-PyYAML==6.0.12.20240311
types-requests==2.31.0.6
types-setuptools==70.0.0.20240524
types-urllib3==1.26.25.14
typing-inspect==0.9.0
typing_extensions==4.12.2
ujson==5.10.0
ulid-py==1.1.0
uri-template==1.3.0
urllib3==1.26.20
uvicorn==0.30.6
uvloop==0.20.0
virtualenv==20.26.2
watchdog==4.0.1
watchfiles==0.24.0
wcwidth==0.2.13
webcolors==24.6.0
webencodings==0.5.1
websocket-client==1.8.0
websockets==13.0.1
Werkzeug==3.0.3
widgetsnbextension==4.0.11
wrapt==1.16.0
xattr==1.1.0
xgboost==2.0.3
xmltodict==0.13.0
yarl==1.9.4
zipp==3.19.2
  1. What did you do?

Writing to Snowflake through pandas.DataFrame.to_sql when a table with datatype VECTOR exists in the schema.

  1. What did you expect to see? What should have happened and what happened instead?

'NullType' object is not callable error is triggered, and it is impossible to write any table through pandas.DataFrame.to_sql while a table with VECTOR datatype exists.

If a table exists in the Snowflake schema having a column with datatype VECTOR, any writing to snowflake using pandas.DataFrame.to_sql will fail due to introspection. As the method get_columns reads the whole schema and there is a column of usuported datatype, method _get_schema_columns will trigger a error here.

One possible solution would be to pass the table to the query built in _get_schema_columns and avoid doing the filter only at the method return statement, if the whole table introspection is not necessary.

  1. Can you set logging to DEBUG and collect the logs?

I believe the log is not necessary for this issue.

sfc-gh-dszmolka commented 1 month ago

hi - thank you for raising this, and the detailed analysis. VECTOR is documented to be unsupported (for now) but I fully agree it shouldn't break anything still. We'll take a look.