snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
579 stars 468 forks source link

SNOW-449747: Using different roles with same query may result snowflake unable to parse query result #851

Closed z27fang closed 3 years ago

z27fang commented 3 years ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using? Python 3.7.11

  2. What operating system and processor architecture are you using?

    Platform: Darwin-19.6.0-x86_64-i386-64bit

  3. What are the component versions in the environment (pip freeze)?

    alembic==1.7.1
    anyio==3.3.0
    apache-airflow==2.1.3
    apache-airflow-providers-databricks==2.0.0
    apache-airflow-providers-ftp==2.0.0
    apache-airflow-providers-http==2.0.0
    apache-airflow-providers-imap==2.0.0
    apache-airflow-providers-slack==4.0.0
    apache-airflow-providers-sqlite==2.0.0
    apispec==3.3.2
    argcomplete==1.12.3
    asn1crypto==1.4.0
    attrs==20.3.0
    azure-common==1.1.27
    azure-core==1.17.0
    azure-storage-blob==12.8.1
    Babel==2.9.1
    blinker==1.4
    boto3==1.9.253
    botocore==1.12.253
    cached-property==1.5.2
    cattrs==1.5.0
    certifi==2021.5.30
    cffi==1.14.6
    chardet==4.0.0
    charset-normalizer==2.0.4
    click==7.1.2
    clickclick==20.10.2
    colorama==0.4.4
    colorlog==6.4.1
    common==1.0
    commonmark==0.9.1
    croniter==1.0.15
    cryptography==3.4.8
    defusedxml==0.7.1
    dill==0.3.4
    dnspython==2.1.0
    docutils==0.15.2
    email-validator==1.1.3
    Flask==1.1.4
    Flask-AppBuilder==3.3.2
    Flask-Babel==1.0.0
    Flask-Caching==1.10.1
    Flask-JWT-Extended==3.25.1
    Flask-Login==0.4.1
    Flask-OpenID==1.2.5
    Flask-SQLAlchemy==2.5.1
    Flask-WTF==0.14.3
    future==0.18.2
    graphviz==0.17
    gunicorn==20.1.0
    h11==0.12.0
    httpcore==0.13.6
    httpx==0.19.0
    idna==3.2
    ijson==3.1.4
    importlib-metadata==4.8.1
    importlib-resources==1.5.0
    inflection==0.5.1
    iso8601==0.1.16
    isodate==0.6.0
    itsdangerous==1.1.0
    Jinja2==2.11.3
    jmespath==0.10.0
    jsonschema==3.2.0
    lazy-object-proxy==1.6.0
    lockfile==0.12.2
    Mako==1.1.5
    Markdown==3.3.4
    MarkupSafe==1.1.1
    marshmallow==3.13.0
    marshmallow-enum==1.5.1
    marshmallow-oneofschema==3.0.1
    marshmallow-sqlalchemy==0.23.1
    msrest==0.6.21
    numpy==1.21.2
    oauthlib==3.1.1
    openapi-schema-validator==0.1.5
    openapi-spec-validator==0.3.1
    oscrypto==1.2.1
    pandas==1.2.5
    pendulum==2.1.2
    prison==0.2.1
    psutil==5.8.0
    pyarrow==5.0.0
    pycparser==2.20
    pycryptodomex==3.10.1
    Pygments==2.10.0
    PyJWT==1.7.1
    pyOpenSSL==20.0.1
    pyrsistent==0.18.0
    python-daemon==2.3.0
    python-dateutil==2.8.2
    python-nvd3==0.15.0
    python-slugify==4.0.1
    python3-openid==3.2.0
    pytz==2021.1
    pytzdata==2020.1
    PyYAML==5.4.1
    requests==2.26.0
    requests-oauthlib==1.3.0
    rfc3986==1.5.0
    rich==10.9.0
    s3transfer==0.2.1
    setproctitle==1.2.2
    six==1.16.0
    slack-sdk==3.10.1
    sniffio==1.2.0
    snowflake==0.0.3
    snowflake-connector-python==2.6.0
    SQLAlchemy==1.3.24
    SQLAlchemy-JSONField==1.0.0
    SQLAlchemy-Utils==0.37.8
    swagger-ui-bundle==0.0.8
    tabulate==0.8.9
    tenacity==6.2.0
    termcolor==1.1.0
    text-unidecode==1.3
    typing-extensions==3.10.0.2
    unicodecsv==0.14.1
    urllib3==1.25.11
    websocket-client==0.47.0
    Werkzeug==1.0.1
    WTForms==2.3.3
    zipp==3.5.0
  4. What did you do? In snowflake database I have 1 user with 2 different roles. One as operator and the other named monitor.

    The operator role has the right to access through all tables but not credit history, where the monitor role can access snowflake credit consumption history.

    The query I ran:

    with raw as (
    select 
    sum(credits_used) as credit,
    sum(credits_used_compute) as credit_compute,
    sum(credits_used_cloud_services) as credit_cloud,
    to_date(end_time) as date,
    warehouse_name
    from table(information_schema.warehouse_metering_history(
        date_range_start => dateadd(days, -175, {0}),
        date_range_end => {0}
    ))
    group by 4, 5
    order by 4),
    mean_std as (
    select distinct
    stddev(credit) over (partition by warehouse_name) as deviation,
    avg(credit) over (partition by warehouse_name) as mean,
    warehouse_name
    from raw
    ),
    today as (
    select
    credit,
    warehouse_name from raw where date = {0}
    ),
    final as (
    select
    t1.mean + 2 * t1.deviation as threshold,
    iff(t2.credit is null, 0, t2.credit) as credit,
    t1.warehouse_name
    from mean_std as t1
    left join today as t2
    on t1.warehouse_name = t2.warehouse_name
    )
    select threshold, credit, warehouse_name from final where threshold < credit

    Basically, it selects credit consumption from different dates, do some calculation and find out outliers based on some simple rule.

While using the monitor role, the query runs all good. But when using the operator role (which does not have access to credit history), the program throws an error:

[2021-09-02 17:30:35,666] {result_batch.py:180} ERROR - Don't know how to construct ResultBatches from response: {'parameters': [{'name': 'CLIENT_PREFETCH_THREADS', 'value': 4}, {'name': 'TIMESTAMP_OUTPUT_FORMAT', 'value': 'YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM'}, {'name': 'TIME_OUTPUT_FORMAT', 'value': 'HH24:MI:SS'}, {'name': 'CLIENT_RESULT_CHUNK_SIZE', 'value': 160}, {'name': 'TIMESTAMP_TZ_OUTPUT_FORMAT', 'value': ''}, {'name': 'CLIENT_SESSION_KEEP_ALIVE', 'value': False}, {'name': 'CLIENT_OUT_OF_BAND_TELEMETRY_ENABLED', 'value': False}, {'name': 'CLIENT_METADATA_USE_SESSION_DATABASE', 'value': False}, {'name': 'ENABLE_STAGE_S3_PRIVATELINK_FOR_US_EAST_1', 'value': False}, {'name': 'CLIENT_RESULT_PREFETCH_THREADS', 'value': 1}, {'name': 'TIMESTAMP_NTZ_OUTPUT_FORMAT', 'value': 'YYYY-MM-DD HH24:MI:SS.FF3'}, {'name': 'CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX', 'value': False}, {'name': 'CLIENT_HONOR_CLIENT_TZ_FOR_TIMESTAMP_NTZ', 'value': True}, {'name': 'CLIENT_MEMORY_LIMIT', 'value': 1536}, {'name': 'CLIENT_TIMESTAMP_TYPE_MAPPING', 'value': 'TIMESTAMP_LTZ'}, {'name': 'TIMEZONE', 'value': 'America/Los_Angeles'}, {'name': 'CLIENT_RESULT_PREFETCH_SLOTS', 'value': 2}, {'name': 'CLIENT_TELEMETRY_ENABLED', 'value': True}, {'name': 'CLIENT_DISABLE_INCIDENTS', 'value': True}, {'name': 'CLIENT_USE_V1_QUERY_API', 'value': True}, {'name': 'CLIENT_RESULT_COLUMN_CASE_INSENSITIVE', 'value': False}, {'name': 'BINARY_OUTPUT_FORMAT', 'value': 'HEX'}, {'name': 'CLIENT_ENABLE_LOG_INFO_STATEMENT_PARAMETERS', 'value': False}, {'name': 'CLIENT_CONSENT_CACHE_ID_TOKEN', 'value': False}, {'name': 'DATE_OUTPUT_FORMAT', 'value': 'YYYY-MM-DD'}, {'name': 'CLIENT_FORCE_PROTECT_ID_TOKEN', 'value': True}, {'name': 'CLIENT_STAGE_ARRAY_BINDING_THRESHOLD', 'value': 65280}, {'name': 'CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY', 'value': 3600}, {'name': 'AUTOCOMMIT', 'value': True}, {'name': 'CLIENT_SESSION_CLONE', 'value': False}, {'name': 'TIMESTAMP_LTZ_OUTPUT_FORMAT', 'value': ''}], 'rowtype': [{'name': 'THRESHOLD', 'database': '', 'schema': '', 'table': 'SYS$_VIEW_153', 'scale': None, 'precision': None, 'byteLength': None, 'type': 'real', 'nullable': True, 'collation': None, 'length': None}, {'name': 'CREDIT', 'database': '', 'schema': '', 'table': 'SYS$_VIEW_153', 'scale': 9, 'precision': 38, 'byteLength': None, 'type': 'fixed', 'nullable': True, 'collation': None, 'length': None}, {'name': 'WAREHOUSE_NAME', 'database': '', 'schema': '', 'table': 'SYS$_VIEW_153', 'scale': None, 'precision': None, 'byteLength': 16777216, 'type': 'text', 'nullable': True, 'collation': None, 'length': 16777216}], 'rowsetBase64': '', 'total': 0, 'returned': 0, 'queryId': '019eb00a-0503-04e0-0000-085d05c3420e', 'databaseProvider': None, 'finalDatabaseName': 'AUDIENCE', 'finalSchemaName': 'PUBLIC', 'finalWarehouseName': 'ETL_WAREHOUSE', 'finalRoleName': 'OPERATOR', 'numberOfBinds': 0, 'arrayBindSupported': False, 'statementTypeId': 4096, 'version': 1, 'sendResultTime': 1630618235620, 'queryResultFormat': 'arrow'}

This error happens after I fire cur.execute(query).

  1. What did you expect to see?

    The expected behaviour might be a warning that the current role doesn't support the query, or throw an informative error instead of telling user that it's unable to construct the result due to some reason. (I am not sure if the above error is expected as it's coded in connector) By doing some digging, I noticed from the return body, it has: 'rowsetBase64': '' and 'queryResultFormat': 'arrow'. This results https://github.com/snowflakedb/snowflake-connector-python/blob/a3f4cfa407c9f3e9ede234dc6d856b41c8a4037c/src/snowflake/connector/cursor.py#L783 _query_result_format to be set to 'arrow'. And https://github.com/snowflakedb/snowflake-connector-python/blob/a3f4cfa407c9f3e9ede234dc6d856b41c8a4037c/src/snowflake/connector/result_batch.py#L105 rowset_b64 to be set to '""' Finally it goes to this error https://github.com/snowflakedb/snowflake-connector-python/blob/a3f4cfa407c9f3e9ede234dc6d856b41c8a4037c/src/snowflake/connector/result_batch.py#L179-L190

  2. Can you set logging to DEBUG and collect the logs?

[2021-09-02 17:46:49,784] {cursor.py:610} DEBUG - executing SQL/command
[2021-09-02 17:46:49,784] {cursor.py:645} DEBUG - binding: [with raw as ( select sum(credits_used) as credit, sum(credits_used_compute) as c...] with input=[None], processed=[{}]
[2021-09-02 17:46:49,784] {cursor.py:678} INFO - query: [with raw as ( select sum(credits_used) as credit, sum(credits_used_compute) as c...]
[2021-09-02 17:46:49,784] {connection.py:1212} DEBUG - sequence counter: 3
[2021-09-02 17:46:49,785] {cursor.py:440} DEBUG - Request id: foo
[2021-09-02 17:46:49,785] {cursor.py:443} DEBUG - running query [with raw as ( select sum(credits_used) as credit, sum(credits_used_compute) as c...]
[2021-09-02 17:46:49,785] {cursor.py:452} DEBUG - is_file_transfer: False
[2021-09-02 17:46:49,785] {connection.py:935} DEBUG - _cmd_query
[2021-09-02 17:46:49,785] {connection.py:962} DEBUG - sql=[with raw as ( select sum(credits_used) as credit, sum(credits_used_compute) as c...], sequence_id=[3], is_file_transfer=[False]
[2021-09-02 17:46:49,786] {network.py:1104} DEBUG - Session status for SessionPool 'mysecret.snowflakecomputing.com', SessionPool 1/1 active sessions
[2021-09-02 17:46:49,786] {network.py:823} DEBUG - remaining request timeout: None, retry cnt: 1
[2021-09-02 17:46:49,786] {network.py:801} DEBUG - Request guid: foo
[2021-09-02 17:46:49,786] {network.py:970} DEBUG - socket timeout: 60
[2021-09-02 17:46:50,906] {connectionpool.py:461} DEBUG - https://mysecret.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId=foo&request_guid=bar HTTP/1.1" 200 1070
[2021-09-02 17:46:50,907] {network.py:999} DEBUG - SUCCESS
[2021-09-02 17:46:50,907] {network.py:1110} DEBUG - Session status for SessionPool 'rg55507.us-east-1.snowflakecomputing.com', SessionPool 0/1 active sessions
[2021-09-02 17:46:50,907] {network.py:710} DEBUG - ret[code] = None, after post request
[2021-09-02 17:46:50,908] {network.py:732} DEBUG - Query id: foo
[2021-09-02 17:46:50,908] {cursor.py:700} DEBUG - sfqid: bar
[2021-09-02 17:46:50,908] {cursor.py:702} INFO - query execution done
[2021-09-02 17:46:50,908] {cursor.py:704} DEBUG - SUCCESS
[2021-09-02 17:46:50,908] {cursor.py:707} DEBUG - PUT OR GET: None
[2021-09-02 17:46:50,908] {cursor.py:807} DEBUG - Query result format: arrow
[2021-09-02 17:46:50,909] {result_batch.py:180} ERROR - Don't know how to construct ResultBatches from response: {'parameters': [{'name': 'CLIENT_PREFETCH_THREADS', 'value': 4}, {'name': 'TIMESTAMP_OUTPUT_FORMAT', 'value': 'YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM'}, {'name': 'TIME_OUTPUT_FORMAT', 'value': 'HH24:MI:SS'}, {'name': 'CLIENT_RESULT_CHUNK_SIZE', 'value': 160}, {'name': 'TIMESTAMP_TZ_OUTPUT_FORMAT', 'value': ''}, {'name': 'CLIENT_SESSION_KEEP_ALIVE', 'value': False}, {'name': 'CLIENT_OUT_OF_BAND_TELEMETRY_ENABLED', 'value': False}, {'name': 'CLIENT_METADATA_USE_SESSION_DATABASE', 'value': False}, {'name': 'ENABLE_STAGE_S3_PRIVATELINK_FOR_US_EAST_1', 'value': False}, {'name': 'CLIENT_RESULT_PREFETCH_THREADS', 'value': 1}, {'name': 'TIMESTAMP_NTZ_OUTPUT_FORMAT', 'value': 'YYYY-MM-DD HH24:MI:SS.FF3'}, {'name': 'CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX', 'value': False}, {'name': 'CLIENT_HONOR_CLIENT_TZ_FOR_TIMESTAMP_NTZ', 'value': True}, {'name': 'CLIENT_MEMORY_LIMIT', 'value': 1536}, {'name': 'CLIENT_TIMESTAMP_TYPE_MAPPING', 'value': 'TIMESTAMP_LTZ'}, {'name': 'TIMEZONE', 'value': 'America/Los_Angeles'}, {'name': 'CLIENT_RESULT_PREFETCH_SLOTS', 'value': 2}, {'name': 'CLIENT_TELEMETRY_ENABLED', 'value': True}, {'name': 'CLIENT_DISABLE_INCIDENTS', 'value': True}, {'name': 'CLIENT_USE_V1_QUERY_API', 'value': True}, {'name': 'CLIENT_RESULT_COLUMN_CASE_INSENSITIVE', 'value': False}, {'name': 'BINARY_OUTPUT_FORMAT', 'value': 'HEX'}, {'name': 'CLIENT_ENABLE_LOG_INFO_STATEMENT_PARAMETERS', 'value': False}, {'name': 'CLIENT_CONSENT_CACHE_ID_TOKEN', 'value': False}, {'name': 'DATE_OUTPUT_FORMAT', 'value': 'YYYY-MM-DD'}, {'name': 'CLIENT_FORCE_PROTECT_ID_TOKEN', 'value': True}, {'name': 'CLIENT_STAGE_ARRAY_BINDING_THRESHOLD', 'value': 65280}, {'name': 'CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY', 'value': 3600}, {'name': 'AUTOCOMMIT', 'value': True}, {'name': 'CLIENT_SESSION_CLONE', 'value': False}, {'name': 'TIMESTAMP_LTZ_OUTPUT_FORMAT', 'value': ''}], 'rowtype': [{'name': 'THRESHOLD', 'database': '', 'schema': '', 'table': 'SYS$_VIEW_153', 'type': 'real', 'byteLength': None, 'scale': None, 'precision': None, 'length': None, 'nullable': True, 'collation': None}, {'name': 'CREDIT', 'database': '', 'schema': '', 'table': 'SYS$_VIEW_153', 'type': 'fixed', 'byteLength': None, 'scale': 9, 'precision': 38, 'length': None, 'nullable': True, 'collation': None}, {'name': 'WAREHOUSE_NAME', 'database': '', 'schema': '', 'table': 'SYS$_VIEW_153', 'type': 'text', 'byteLength': 16777216, 'scale': None, 'precision': None, 'length': 16777216, 'nullable': True, 'collation': None}], 'rowsetBase64': '', 'total': 0, 'returned': 0, 'queryId': '019eb01a-0503-04ee-0000-085d05c3287a', 'databaseProvider': None, 'finalDatabaseName': 'AUDIENCE', 'finalSchemaName': 'PUBLIC', 'finalWarehouseName': 'ETL_WAREHOUSE', 'finalRoleName': 'OPERATOR', 'numberOfBinds': 0, 'arrayBindSupported': False, 'statementTypeId': 4096, 'version': 1, 'sendResultTime': 1630619210875, 'queryResultFormat': 'arrow'}
[2021-09-02 17:46:50,909] {result_set.py:71} DEBUG - beginning to schedule result batch downloads

The program terminates at the last line.

z27fang commented 3 years ago

Some updates with this issue.

I noticed this usually happens while the result was cached. Today I sent the same query multiple times with same role/account, thought it returned the result, it's still popping the message of Don't know how to construct ResultBatches.... I will share some details for this error later.

sfc-gh-mkeller commented 3 years ago

Hi @z27fang This is actually a bug that was already addressed in #852, but it just happens that after logging this error the code still works properly. You can safely ignore this until 2.6.1 is released (which will have #852 included), or downgrade to 2.5.1 in the meantime. Please re-open this ticket if you believe that this was closed incorrectly