snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
245 stars 104 forks source link

SNOW-1513884: Local Testing Issues with dateadd and current_timestamp Functions in Snowpark Python #1859

Closed hima-gopisetty closed 1 week ago

hima-gopisetty commented 3 weeks ago

Please answer these questions before submitting your issue. Thanks!

1. What version of Python are you using?

   Python 3.11.8 

2. What are the Snowpark Python and pandas versions in the environment?

  pandas==1.5.3
  snowflake-snowpark-python==1.14.0

3. What did you do?

I was using the Snowflake Snowpark Python library to work with DataFrames, specifically adding current timestamps and manipulating date values. Below is the complete, executable code that led to the issues

from snowflake.snowpark.functions import dateadd, lit, current_timestamp
from snowflake.snowpark.types import StringType, StructField, StructType, TimestampType
from snowflake.snowpark import Session

# Create a session - ensure you set up your own configuration appropriately
session = Session.builder.config('local_testing', True).create()

# Create a Snowflake DataFrame
sf_df = session.create_dataframe(
    data = [
        ["RL_TEST_ROLE_GRANTS", "ABC123", "2000-01-01 00:00:00"],
        ["RL_TEST_ROLE_SUPPORT", "ABC123", "2000-01-01 00:00:00"]
    ],
    schema=StructType([
        StructField("ROLE_NAME", StringType()),
        StructField("GIT_COMMIT_VALUE", StringType()),
        StructField("RECORD_CREATED_TS", TimestampType()),
    ])
)

# Add current_timestamp to the DataFrame
sf_ts = sf_df.with_column("current_timestamp", current_timestamp())
#sf_ts.show()

# Attempt to add 10 days to each date in the DataFrame using dateadd
sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), current_timestamp()))
sf_df_with_added_days.show()

4. What did you expect to see?

5. What happened instead?

The following table illustrates the issue with the CURRENT_TIMESTAMP column, where one row unexpectedly contains a NaT value:

| ROLE_NAME            | GIT_COMMIT_VALUE | RECORD_CREATED_TS   | CURRENT_TIMESTAMP          |
|----------------------|------------------|---------------------|----------------------------|
| RL_TEST_ROLE_GRANTS  | ABC123           | 2000-01-01 00:00:00 | 2024-07-01 02:18:38.944660 |
| RL_TEST_ROLE_SUPPORT | ABC123           | 2000-01-01 00:00:00 | NaT                        |
Traceback (most recent call last):
  File "test.py", line 27, in <module>
    sf_df_with_added_days.show()
  File "telemetry.py", line 139, in wrap
    result = func(*args, **kwargs)
  File "dataframe.py", line 3050, in show
    self._show_string(
  File "dataframe.py", line 3168, in _show_string
    result, meta = self._session._conn.get_result_and_metadata(
  File "_connection.py", line 649, in get_result_and_metadata
    res = execute_mock_plan(plan)
  File "_plan.py", line 462, in execute_mock_plan
    column_series = calculate_expression(
  File "_plan.py", line 1318, in calculate_expression
    return calculate_expression(exp.child, input_data, analyzer, expr_to_alias)
  File "_plan.py", line 1320, in calculate_expression
    return handle_function_expression(exp, input_data, analyzer, expr_to_alias)
  File "_plan.py", line 388, in handle_function_expression
    return _MOCK_FUNCTION_IMPLEMENTATION_MAP[func_name](*to_pass_args)
  File "_functions.py", line 1146, in mock_dateadd
    res = datetime_expr.combine(
  File "series.py", line 3396, in combine
    new_values[i] = func(lv, rv)
  File "_functions.py", line 1147, in <lambda>
    value_expr, lambda date, duration: func(cast(date), duration)
  File "_functions.py", line 1107, in add_timedelta
    return date + datetime.timedelta(**{f"{unit}s": duration * scalar})
numpy.core._exceptions._UFuncBinaryResolutionError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('O')

Additional Note:

It's important to mention that the operations described above work as expected when executed in a standard Snowflake session. The issues outlined occur exclusively during local testing with the Snowpark library. This suggests that the mock environment might not fully replicate the behavior of the actual Snowflake environment or there might be issues specific to the local testing implementation.

sfc-gh-sghosh commented 3 weeks ago

Hello @hima-gopisetty ,

Thanks for raising the issue, we are looking into it, will update.

Regards, Sujan

sfc-gh-sghosh commented 2 weeks ago

Hello @hima-gopisetty ,

I tried the code with local testing and its working as expected. There are no Null or Nat values getting inserted, and you need to change the code to add 10 days to column 'RECORD_CREATED_TS'

For the current_timestamp addition:

Q1. I expected that adding the current timestamp would successfully append a new column current_timestamp to each row, showing the exact current timestamp, with no null or NaT values

Ans: Its not adding null or NaT values, proper current_timestamp is getting added to the dataframe.

Q2 For the date_plus_10_days calculation:

The expectation was that the dateadd function would add 10 days to the dates in the RECORD_CREATED_TS column without any errors, accurately reflecting the new dates.

Ans: The code needs to be modified, please use the following code

sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), col("RECORD_CREATED_TS"))) sf_df_with_added_days.show()

Please find the updated code and output

`from snowflake.snowpark.functions import dateadd, lit, current_timestamp from snowflake.snowpark.types import StringType, StructField, StructType, TimestampType from snowflake.snowpark import Session

Create a session - ensure you set up your own configuration appropriately

session = Session.builder.config('local_testing', True).create()

Create a Snowflake DataFrame

sf_df = session.create_dataframe( data = [ ["RL_TEST_ROLE_GRANTS", "ABC123", "2000-01-01 00:00:00"], ["RL_TEST_ROLE_SUPPORT", "ABC123", "2000-01-01 00:00:00"] ], schema=StructType([ StructField("ROLE_NAME", StringType()), StructField("GIT_COMMIT_VALUE", StringType()), StructField("RECORD_CREATED_TS", TimestampType()), ]) ) sf_df.show()

Add current_timestamp to the DataFrame

sf_ts = sf_df.with_column("current_timestamp", current_timestamp()) sf_ts.show()

Attempt to add 10 days to each date in the DataFrame using dateadd

sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), current_timestamp())) sf_df_with_added_days.show()

Add 10 days to the RECORD_CREATED_TS column using dateadd

sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), col("RECORD_CREATED_TS"))) sf_df_with_added_days.show()


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 | |RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |"CURRENT_TIMESTAMP" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |2024-07-04 23:10:52.426186 | |RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |2024-07-04 23:10:52.426186 |


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |"DATE_PLUS_10_DAYS" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |2024-07-14 23:10:52.436700 | |RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |2024-07-14 23:10:52.436700 |


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |"DATE_PLUS_10_DAYS" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |2000-01-11 00:00:00 | |RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |2000-01-11 00:00:00 |

`

Regards, Sujan

hima-gopisetty commented 2 weeks ago

Hi @sfc-gh-sghosh ,

Could you let me know if you are using the same snowflake-snowpark-python==1.14.0 version as mine? When I run it again locally, I encounter the same issue where it's adding NaT to rows. In fact, it's adding NaT to all rows except the first one

-------------------------------------------------------------------------------------------------
|"ROLE_NAME"            |"GIT_COMMIT_VALUE"  |"RECORD_CREATED_TS"  |"CURRENT_TIMESTAMP"         |
-------------------------------------------------------------------------------------------------
|RL_TEST_ROLE_GRANTS    |ABC123              |2000-01-01 00:00:00  |2024-07-04 21:53:13.171780  |
|RL_TEST_ROLE_SUPPORT   |ABC123              |2000-01-01 00:00:00  |NaT                         |
|RL_TEST_ROLE_SUPPORT2  |ABC123              |2000-01-01 00:00:00  |NaT                         |
|RL_TEST_ROLE_SUPPORT3  |ABC123              |2000-01-01 00:00:00  |NaT                         |
-------------------------------------------------------------------------------------------------
sfc-gh-sghosh commented 2 weeks ago

Hello @hima-gopisetty ,

Yes, you are right. The issue is with snowpark python 1.14.0. I am also getting NaT. But with 1.16.0 or with the latest 1.19.0, there is no such error, which means the issue has been fixed already, so please use the latest Snowpark python connector, 1.19.0.

Regards, Sujan