snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
266 stars 110 forks source link

SNOW-1623378: Support mocking functions that are not supported in snowpark #2076

Open GuyOmer opened 2 months ago

GuyOmer commented 2 months ago

What is the current behavior?

Trying to mock a non snowpark function (like greatest_ignore_nulls) results in an exception.

@patch("greatest_ignore_nulls")
def mock_GREATEST_IGNORE_NULLS(*columns: Iterable[ColumnEmulator]) -> ColumnEmulator:
    ....

# Results in
# NotImplementedError: [Local Testing] Function greatest_ignore_nulls is not supported in snowpark-python.

What is the desired behavior?

Same mocking API.

If this is not an existing feature in snowflake-snowpark-python. How would this impact/improve non-local testing mode?

This will be a complementary feature to using call_function.

References, Other Background

Additionally, it is not possible to mock call_function.

sfc-gh-aling commented 2 months ago

hey @GuyOmer , thanks for reaching out.

this is because greatest_ignore_nulls is not implemented in the snowpark python library, for now the patch only works for functions available in snowflake.snowpark.functions.

one workaround is to manually add a fake greatest_ignore_nulls function into the functions module as below:

import snowflake.snowpark.functions
from snowflake.snowpark import Session, Column
from snowflake.snowpark._internal.type_utils import ColumnOrName
from snowflake.snowpark.functions import call_function
from snowflake.snowpark.mock import patch, ColumnEmulator, ColumnType
from snowflake.snowpark.types import IntegerType

session = Session.builder.configs({"local_testing": True}).create()

# this is just a placeholder, implementation does not matter
def fake_greatest_ignore_nulls(*columns: ColumnOrName) -> Column:
    return None

@patch("greatest_ignore_nulls")
def mock_GREATEST_IGNORE_NULLS(*columns) -> ColumnEmulator:
    return ColumnEmulator([1], sf_type=ColumnType(IntegerType(), nullable=False))

# dynamically add function to module so that local test can find it
snowflake.snowpark.functions.greatest_ignore_nulls = fake_greatest_ignore_nulls

df = session.create_dataframe(
    [1, 2, 3, 4],
    schema=["a"]
)
df.select(call_function("greatest_ignore_nulls", df["a"])).show()

I agree we should provide a better way to patch functions not defined in the functions.py module