snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
244 stars 102 forks source link

SNOW-1474920: Cannot Insert `NULL` Value into Numeric Columns With `.sql` When Using Qmark Bindings #1760

Open AidanShipperley opened 4 weeks ago

AidanShipperley commented 4 weeks ago

Description of Issue

Any attempt to insert a value of None into a column with any of the numeric data types in Snowflake via the .sql Snowpark function and qmark parameter bindings results in the following error:

Traceback (most recent call last):
  Worksheet, line 46, in main
  File "snowflake/snowpark/_internal/telemetry.py", line 144, in wrap
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/dataframe.py", line 597, in collect
    return self._internal_collect_with_tag_no_telemetry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/dataframe.py", line 645, in _internal_collect_with_tag_no_telemetry
    return self._session._conn.execute(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/_internal/server_connection.py", line 510, in execute
    result_set, result_meta = self.get_result_set(
                              ^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/_internal/analyzer/snowflake_plan.py", line 191, in wrap
    raise ne.with_traceback(tb) from None
  File "snowflake/snowpark/_internal/analyzer/snowflake_plan.py", line 122, in wrap
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/_internal/server_connection.py", line 612, in get_result_set
    result = self.run_query(
             ^^^^^^^^^^^^^^^
  File "snowflake/snowpark/_internal/server_connection.py", line 123, in wrap
    raise ex
  File "snowflake/snowpark/_internal/server_connection.py", line 117, in wrap
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/_internal/server_connection.py", line 417, in run_query
    raise ex
  File "snowflake/snowpark/_internal/server_connection.py", line 402, in run_query
    results_cursor = self.execute_and_notify_query_listener(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/snowpark/_internal/server_connection.py", line 354, in execute_and_notify_query_listener
    results_cursor = self._cursor.execute(query, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/connector/cursor.py", line 1016, in execute
    Error.errorhandler_wrapper(
  File "snowflake/connector/errors.py", line 232, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "snowflake/connector/errors.py", line 287, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "snowflake/connector/errors.py", line 165, in default_errorhandler
    raise error_class(
snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 01b4f2b0-0305-3d30-0005-edc700089236: 100038 (22018): Numeric value 'None' is not recognized

How to Reproduce

I am running my Python code inside of a Python Worksheet that I am accessing on Snowflake itself (app.snowflake.com), my account is through my job. I apologize if that makes reproducing results difficult, and I am using snowflake-snowpark-python==1.16.0 because that is the latest that is available to me on my Snowflake worksheets.

What version of Python are you using?

3.11

What operating system and processor architecture are you using?

Linux-5.4.181-99.354.amzn2.aarch64-aarch64-with-glibc2.34

What are the component versions in the environment (pip freeze)?

bottleneck==1.3.7
cloudpickle==2.2.1
numexpr==2.8.4
numpy==1.24.3
pandas==2.2.1
pyarrow==14.0.2
python-dateutil==2.8.3+snowflake1
pytz==2024.1
pyyaml==6.0.1
setuptools==69.5.1
six==1.16.0
snowflake-connector-python==0.28.0
snowflake-snowpark-python==1.16.0
typing-extensions==4.11.0
tzdata==2023.3
wheel==0.43.0

What did you do?

Below is the simplest code snippet I could create to reproduce the error, and it will run inside of a Snowflake worksheet as soon as you change the my_db and my_schema parameters at the top to point to your Database and Schema. I've written a comment so you can see which line triggers the error.

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col

def main(session: snowpark.Session):

    # Define these to match your env, table name can be anything
    my_db = "MY_DATABASE"
    my_schema = "MY_SCHEMA"
    my_table_name = "TEST_TABLE_NULLABILITY"

    # Create the table with a NUMBER column that can take nulls
    drop_table_sql = f"DROP TABLE IF EXISTS {my_db}.{my_schema}.{my_table_name};"
    _ = session.sql(drop_table_sql).collect()

    create_table_sql = f"""
        CREATE TABLE {my_db}.{my_schema}.{my_table_name} (
            ID NUMBER NULL
        );
    """
    _ = session.sql(create_table_sql).collect()

    # Confirm that the column is nullable
    check_nullability_sql = f"""
        SELECT IS_NULLABLE
        FROM {my_db}.INFORMATION_SCHEMA.COLUMNS
        WHERE TABLE_NAME = '{my_table_name}'
        AND TABLE_SCHEMA = '{my_schema}';
    """

    is_nullable = session.sql(check_nullability_sql).collect()[0][0]

    if is_nullable == "NO":
        raise ValueError("Column is not nullable, this should not happen.")

    # Attempt to write SQL that inserts a None into the table
    null_insert_sql = f"""
        INSERT INTO {my_db}.{my_schema}.{my_table_name} (ID)
        VALUES (?);
    """
    parameters = [None]

    # THIS LINE WILL ERROR!!!
    _ = session.sql(null_insert_sql, params=parameters).collect()
    # THIS LINE WILL ERROR!!!

    return "SUCCESS!"

What did you expect to see?

All numeric data types (NUMBER, FLOAT, INT, DOUBLE) result in the code erroring out, but this should not be happening. Running the exact same script with VARCHAR as the data type instead of NUMBER results in the code succeeding and successfully inserting a null value into the table for the ID column. Additionally, if you just hardcode NULL instead of using the safe qmark parameter bindings for the .sql call, the code does not error and successfully inserts a null value into the table for the ID column:

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col

def main(session: snowpark.Session):

    my_db = "MY_DATABASE"
    my_schema = "MY_SCHEMA"
    my_table_name = "TEST_TABLE_NULLABILITY"

    # Create the table with a NUMBER column that can take nulls
    drop_table_sql = f"DROP TABLE IF EXISTS {my_db}.{my_schema}.{my_table_name};"
    _ = session.sql(query=drop_table_sql).collect()

    create_table_sql = f"""
        CREATE TABLE {my_db}.{my_schema}.{my_table_name} (
            ID NUMBER NULL
        );
    """
    _ = session.sql(query=create_table_sql).collect()

    # Confirm that the column is nullable
    check_nullability_sql = f"""
        SELECT IS_NULLABLE
        FROM {my_db}.INFORMATION_SCHEMA.COLUMNS
        WHERE TABLE_NAME = '{my_table_name}'
        AND TABLE_SCHEMA = '{my_schema}';
    """

    is_nullable = session.sql(query=check_nullability_sql).collect()[0][0]

    if is_nullable == "NO":
        raise ValueError("Column is not nullable, this should not happen.")

    # Attempt to write SQL that inserts a None into the table
    null_insert_sql = f"""
        INSERT INTO {my_db}.{my_schema}.{my_table_name} (ID)
        VALUES (NULL);
    """

    # THIS LINE WILL NOW SUCCEED!!!
    _ = session.sql(query=null_insert_sql).collect()
    # THIS LINE WILL NOW SUCCEED!!!

    return "SUCCESS!"

Can you set logging to DEBUG and collect the logs?

I've attached a log from the execution of my above script that errors out:

2024-06-11 16:35:47,821 - Dummy-1 server_connection.py:405 - run_query() - DEBUG - Execute query [queryID: 01b4f2a7-0305-3d26-0005-edc7000813c6] DROP TABLE IF EXISTS MY_DATABASE.MY_SCHEMA.TEST_TABLE_NULLABILITY;
2024-06-11 16:35:47,821 - Dummy-1 result_set.py:65 - result_set_iterator() - DEBUG - beginning to schedule result batch downloads
2024-06-11 16:35:47,821 - Dummy-1 result_set.py:128 - _report_metrics() - DEBUG - Reporting metrics not supported
2024-06-11 16:35:47,822 - Dummy-1 cursor.py:880 - execute() - DEBUG - executing SQL/command
2024-06-11 16:35:47,822 - Dummy-1 cursor.py:953 - execute() - INFO - query: [CREATE TABLE MY_DATABASE.MY_SCHEMA.TEST_TABLE_NULLABILITY ( ID NUMBER NULL );...]
2024-06-11 16:35:47,822 - Dummy-1 connection.py:509 - _next_sequence_counter() - DEBUG - sequence counter: 3
2024-06-11 16:35:47,823 - Dummy-1 cursor.py:591 - _execute_helper() - DEBUG - Request id: 04cd6893-9593-4c24-84ad-139e24fad72a
2024-06-11 16:35:47,823 - Dummy-1 cursor.py:594 - _execute_helper() - DEBUG - running query [CREATE TABLE MY_DATABASE.MY_SCHEMA.TEST_TABLE_NULLABILITY ( ID NUMBER NULL );...]
2024-06-11 16:35:47,823 - Dummy-1 cursor.py:603 - _execute_helper() - DEBUG - is_file_transfer: False
2024-06-11 16:35:47,823 - Dummy-1 cursor.py:643 - _execute_helper() - DEBUG - Failed to set SIGINT handler. Not in main thread. Ignored...
2024-06-11 16:35:47,971 - Dummy-1 cursor.py:665 - _execute_helper() - DEBUG - Failed to reset SIGINT handler. Not in main thread. Ignored...
2024-06-11 16:35:47,971 - Dummy-1 cursor.py:960 - execute() - DEBUG - sfqid: 01b4f2a7-0305-3d26-0005-edc7000813ca
2024-06-11 16:35:47,971 - Dummy-1 cursor.py:966 - execute() - INFO - query execution done
2024-06-11 16:35:47,971 - Dummy-1 cursor.py:971 - execute() - DEBUG - SUCCESS
2024-06-11 16:35:47,971 - Dummy-1 cursor.py:1076 - _init_result_and_meta() - DEBUG - Query result format: json
2024-06-11 16:35:47,971 - Dummy-1 result_batch.py:444 - _parse() - DEBUG - parsing for result batch id: 1
2024-06-11 16:35:47,971 - Dummy-1 server_connection.py:405 - run_query() - DEBUG - Execute query [queryID: 01b4f2a7-0305-3d26-0005-edc7000813ca] 
        CREATE TABLE MY_DATABASE.MY_SCHEMA.TEST_TABLE_NULLABILITY (
            ID NUMBER NULL
        );

2024-06-11 16:35:47,971 - Dummy-1 result_set.py:65 - result_set_iterator() - DEBUG - beginning to schedule result batch downloads
2024-06-11 16:35:47,971 - Dummy-1 result_set.py:128 - _report_metrics() - DEBUG - Reporting metrics not supported
2024-06-11 16:35:47,972 - Dummy-1 cursor.py:880 - execute() - DEBUG - executing SQL/command
2024-06-11 16:35:47,973 - Dummy-1 cursor.py:953 - execute() - INFO - query: [SELECT IS_NULLABLE FROM MY_DATABASE.INFORMATION_SCHEMA.COLUMNS WHERE T...]
2024-06-11 16:35:47,973 - Dummy-1 connection.py:509 - _next_sequence_counter() - DEBUG - sequence counter: 4
2024-06-11 16:35:47,973 - Dummy-1 cursor.py:591 - _execute_helper() - DEBUG - Request id: 380c9e62-ca42-45dc-86eb-e3d93e0c9c39
2024-06-11 16:35:47,973 - Dummy-1 cursor.py:594 - _execute_helper() - DEBUG - running query [SELECT IS_NULLABLE FROM MY_DATABASE.INFORMATION_SCHEMA.COLUMNS WHERE T...]
2024-06-11 16:35:47,973 - Dummy-1 cursor.py:603 - _execute_helper() - DEBUG - is_file_transfer: False
2024-06-11 16:35:47,973 - Dummy-1 cursor.py:643 - _execute_helper() - DEBUG - Failed to set SIGINT handler. Not in main thread. Ignored...
2024-06-11 16:35:48,946 - Dummy-1 cursor.py:665 - _execute_helper() - DEBUG - Failed to reset SIGINT handler. Not in main thread. Ignored...
2024-06-11 16:35:48,946 - Dummy-1 cursor.py:960 - execute() - DEBUG - sfqid: 01b4f2a7-0305-3d26-0005-edc7000813ce
2024-06-11 16:35:48,946 - Dummy-1 cursor.py:966 - execute() - INFO - query execution done
2024-06-11 16:35:48,946 - Dummy-1 cursor.py:971 - execute() - DEBUG - SUCCESS
2024-06-11 16:35:48,946 - Dummy-1 cursor.py:1076 - _init_result_and_meta() - DEBUG - Query result format: arrow
2024-06-11 16:35:48,946 - Dummy-1 server_connection.py:405 - run_query() - DEBUG - Execute query [queryID: 01b4f2a7-0305-3d26-0005-edc7000813ce] 
        SELECT IS_NULLABLE
        FROM MY_DATABASE.INFORMATION_SCHEMA.COLUMNS
        WHERE TABLE_NAME = 'TEST_TABLE_NULLABILITY'
        AND TABLE_SCHEMA = 'DL';

2024-06-11 16:35:48,946 - Dummy-1 result_batch.py:73 - _create_nanoarrow_iterator() - DEBUG - Using nanoarrow as the arrow data converter
2024-06-11 16:35:48,946 - Dummy-1 result_set.py:65 - result_set_iterator() - DEBUG - beginning to schedule result batch downloads
2024-06-11 16:35:48,946 - Dummy-1 result_set.py:128 - _report_metrics() - DEBUG - Reporting metrics not supported
2024-06-11 16:35:48,947 - Dummy-1 cursor.py:880 - execute() - DEBUG - executing SQL/command
2024-06-11 16:35:48,947 - Dummy-1 connection.py:751 - _process_params_qmarks() - DEBUG - idx: 1, type: ANY
2024-06-11 16:35:48,947 - Dummy-1 cursor.py:953 - execute() - INFO - query: [INSERT INTO MY_DATABASE.MY_SCHEMA.TEST_TABLE_NULLABILITY (ID) VALUES (?);]
2024-06-11 16:35:48,947 - Dummy-1 connection.py:509 - _next_sequence_counter() - DEBUG - sequence counter: 5
2024-06-11 16:35:48,947 - Dummy-1 cursor.py:591 - _execute_helper() - DEBUG - Request id: 4d2e84bf-a367-45b8-a645-d99d2bc58e28
2024-06-11 16:35:48,948 - Dummy-1 cursor.py:594 - _execute_helper() - DEBUG - running query [INSERT INTO MY_DATABASE.MY_SCHEMA.TEST_TABLE_NULLABILITY (ID) VALUES (?);]
2024-06-11 16:35:48,948 - Dummy-1 cursor.py:603 - _execute_helper() - DEBUG - is_file_transfer: False
2024-06-11 16:35:48,948 - Dummy-1 cursor.py:643 - _execute_helper() - DEBUG - Failed to set SIGINT handler. Not in main thread. Ignored...
2024-06-11 16:35:49,197 - Dummy-1 cursor.py:665 - _execute_helper() - DEBUG - Failed to reset SIGINT handler. Not in main thread. Ignored...
2024-06-11 16:35:49,197 - Dummy-1 cursor.py:960 - execute() - DEBUG - sfqid: 01b4f2a7-0305-3d26-0005-edc7000813d2
2024-06-11 16:35:49,197 - Dummy-1 cursor.py:966 - execute() - INFO - query execution done
2024-06-11 16:35:49,197 - Dummy-1 cursor.py:1004 - execute() - DEBUG - {'data': {'internalError': False, 'errorCode': '100038', 'age': 0, 'sqlState': '22018', 'queryId': '01b4f2a7-0305-3d26-0005-edc7000813d2'}, 'code': '100038', 'message': "Numeric value 'None' is not recognized", 'success': False, 'headers': None}
2024-06-11 16:35:49,199 - Dummy-1 errors.py:145 - exception_telemetry() - DEBUG - Sending exception telemetry failed

Please let me know if there is anything else that I can provide.

AidanShipperley commented 3 weeks ago

Small update, this is also affecting all date and time data types as well. The complete list of data types that this issue appears to be affecting currently is:

[
    "NUMBER", 
    "DECIMAL", 
    "NUMERIC", 
    "INT", 
    "INTEGER", 
    "BIGINT", 
    "SMALLINT", 
    "TINYINT", 
    "BYTEINT", 
    "FLOAT", 
    "FLOAT4", 
    "FLOAT8", 
    "DOUBLE", 
    "DOUBLE PRECISION", 
    "REAL", 
    "DATE", 
    "DATETIME", 
    "TIME", 
    "TIMESTAMP", 
    "TIMESTAMP_LTZ", 
    "TIMESTAMP_NTZ", 
    "TIMESTAMP_TZ"
]
sfc-gh-sghosh commented 3 weeks ago

Hello @AidanShipperley ,

Thanks for raising the issue, we are looking into it, will update.

Regards, Sujan

sfc-gh-sghosh commented 3 weeks ago

Hello @AidanShipperley ,

The same code works in Jupyter with snowpark. session.sql("insert into TEST_TABLE_NULLABILITY (ID) VALUES (?)", params=[None]).collect()

It seems the issue with Python worksheet where it's not treating the query as Direct SQL Embedding Example: INSERT INTO TEST_TABLE_NULLABILITY (ID) VALUES (NULL)

Snowpark for Python expects direct SQL syntax with the appropriate values embedded directly or bound using Snowflake's DataFrame API.

Example: data = [(None,)] schema = StructType([StructField("ID", IntegerType())]) df = session.create_dataframe(data, schema=schema) df.write.save_as_table("TEST_TABLE_NULLABILITY", mode="append") result = session.table("TEST_TABLE_NULLABILITY").collect() for row in result: print(row) `

We are checking and will update further.

Regards, Sujan

Output: `Row(ID=None)