snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
586 stars 468 forks source link

SNOW-298756: executemany - INSERT regex wrong #654

Closed shahargl closed 1 year ago

shahargl commented 3 years ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using (python --version)? Python 3.7.3

  2. What operating system and processor architecture are you using (python -c 'import platform; print(platform.platform())')? Darwin-19.5.0-x86_64-i386-64bit

  3. What are the component versions in the environment (pip freeze)?

alembic==1.5.5
appnope==0.1.2
asn1crypto==1.4.0
attrs==20.3.0
azure-common==1.1.26
azure-core==1.11.0
azure-storage-blob==12.7.1
backcall==0.2.0
boto3==1.17.15
botocore==1.20.15
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==3.0.4
click==7.1.2
clickclick==20.10.2
connexion==2.7.0
cryptography==3.2.1
curlify==2.2.1
decorator==4.4.2
Flask==1.1.1
Flask-Cors==3.0.10
Flask-Testing==0.8.0
google-api-core==1.22.0
google-api-python-client==1.9.3
google-auth==1.27.0
google-auth-httplib2==0.0.4
google-cloud-core==1.6.0
google-cloud-kms==1.4.0
google-cloud-pubsub==2.4.0
google-cloud-scheduler==2.1.0
google-cloud-secret-manager==1.0.0
google-cloud-storage==1.29.0
google-cloud-tasks==1.5.0
google-resumable-media==0.5.1
googleapis-common-protos==1.52.0
graphql-core==3.1.3
greenlet==1.0.0
grpc-google-iam-v1==0.12.3
grpcio==1.35.0
gunicorn==20.0.4
httplib2==0.19.0
hypothesis==5.49.0
hypothesis-graphql==0.3.3
hypothesis-jsonschema==0.19.0
idna==2.8
importlib-metadata==1.7.0
inflection==0.5.1
iniconfig==1.1.1
ipython==7.21.0
ipython-genutils==0.2.0
isodate==0.6.0
itsdangerous==1.1.0
jedi==0.18.0
Jinja2==2.11.3
jmespath==0.10.0
jsonpatch==1.28
jsonpointer==2.0
jsonschema==3.2.0
junit-xml==1.9
jwcrypto==0.8
keyring==22.0.1
libcst==0.3.17
Mako==1.1.4
marisa-trie==0.7.5
MarkupSafe==1.1.1
msrest==0.6.21
multidict==5.1.0
mypy-extensions==0.4.3
ndg-httpsclient==0.5.1
oauthlib==3.1.0
openapi-schema-validator==0.1.2
openapi-spec-validator==0.3.0
oscrypto==1.2.1
packaging==20.9
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
pluggy==0.13.1
prompt-toolkit==3.0.17
proto-plus==1.13.0
protobuf==3.15.2
ptyprocess==0.7.0
pusher==3.0.0
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pycryptodomex==3.10.1
Pygments==2.8.1
PyJWT==2.0.1
PyNaCl==1.4.0
pyOpenSSL==19.1.0
pyparsing==2.4.7
pyrsistent==0.17.3
pytest==6.2.2
pytest-subtests==0.4.0
python-dateutil==2.6.0
python-dotenv==0.15.0
python-editor==1.0.4
python-json-logger==2.0.1
python-jwt==3.3.0
pytz==2020.5
PyYAML==5.4.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.7.2
s3transfer==0.3.4
schemathesis==3.1.0
six==1.15.0
snowflake-connector-python==2.3.10
snowflake-sqlalchemy==1.2.4
sortedcontainers==2.3.0
SQLAlchemy==1.3.12
SQLAlchemy-Utils==0.36.8
starlette==0.13.8
strict-rfc3339==0.7
swagger-ui-bundle==0.0.8
toml==0.10.2
traitlets==5.0.5
typing-extensions==3.7.4.3
typing-inspect==0.6.0
uritemplate==3.0.1
urllib3==1.25.11
wcwidth==0.2.5
Werkzeug==1.0.1
yarl==1.6.3
zipp==3.4.0
  1. What did you do? If possible, provide a recipe for reproducing the error. A complete runnable program is good.

I'm trying to insert multiple rows with VARIANT column. The problem is that it currently not supported both in python connector and sqlalchemy extension.

So my way to overcome this is to use SQLAlchemy events to achieve it:

@compiles(Insert)
def patch_insert(insert, compiler, **kw):
    # Patches INSERT SQL queries so sqlalchemy ORM will support Snowflake VARIANT
    s = compiler.visit_insert(insert, **kw)
    s = s.replace("VALUES (%(raw)s)", "(SELECT PARSE_JSON(column1) as raw from values (%(raw)s))")
    return s

This works perfectly for one row. But when I try to commit multiple rows at once, there is a redundant ")" after every row.

Digging in, the problem is the INSERT_SQL_VALUES_RE regex in snowflake/connector/cursor.py:

INSERT_SQL_VALUES_RE = re.compile(r'.*VALUES\s*(\(.*\)).*',
                                      re.IGNORECASE | re.MULTILINE | re.DOTALL)

The problem is that this regex is greedy and catches also the second ")", so in line 800,

fmt = m.group(1)

makes fmt to be '(%(raw)s))' instead of just '(%(raw)s)', which in turn breaks the query.

  1. What did you expect to see? The INSERT_SQL_VALUES_RE should be NOT greedy and stop after the first ")", so the query won't be broken.

  2. What did you see instead? INSERT_SQL_VALUES_RE is greedy and takes also the second ")", which relevant to the (SELECT PARSE_JSON(column1) and not to the specific row/value.

  3. Can you set logging to DEBUG and collect the logs?

import logging
import os

for logger_name in ['snowflake.sqlalchemy', 'snowflake.connector', 'botocore']:
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)
sfc-gh-stan commented 3 years ago

Hi @shahargl, thank you for reaching out, we really appreciate your investigation! Would you like to submit a PR to fix the Regex? If not, I will start working on this. Again, thank you for bringing this to our attention! We will introduce some tests to make sure this issue is caught.

shahargl commented 3 years ago

Hi @sfc-gh-stan, thanks for the quick response

I've opened a PR that fixes the formatting bug - https://github.com/snowflakedb/snowflake-connector-python/pull/657 I've tested it locally and it does the work for me.

In the solution, I've assumpted that %(VARNAME)s is the convention (e.g. https://realpython.com/python-f-strings/)

github-actions[bot] commented 1 year ago

To clean up and re-prioritize bugs and feature requests we are closing all issues older than 6 months as of March 1, 2023. If there are any issues or feature requests that you would like us to address, please re-create them. For urgent issues, opening a support case with this link Snowflake Community is the fastest way to get a response