snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
256 stars 106 forks source link

SNOW-742421 daydiff method to match Spark datediff method #730

Open sfc-gh-mrojas opened 1 year ago

sfc-gh-mrojas commented 1 year ago

What is the current behavior?

The snowflake datediff has been built to match the snowflake platform https://docs.snowflake.com/en/sql-reference/functions/datediff

The spark datediff is different from snowflake datediff this can imply some manual changes. Spark diff is more of a diference in number of dates and the order or parameters is different.

What is the desired behavior?

A new method ‘daydiff’ is provided and this method provides higher compatibility with the previous method and spark make it a simple replacement.

How would this improve snowflake-snowpark-python?

It accelerates and simplifies the transition for people coming from spark.

References, Other Background

sfc-gh-sfan commented 1 year ago

Could you provide an example that shows the difference between the two implementations?

sfc-gh-mrojas commented 1 year ago

Sure.

df = spark.createDataFrame([('2015-04-08','2015-05-10')], ['d1', 'd2'])
df.select(datediff(df.d2, df.d1).alias('diff')).collect()
[Row(diff=32)]

In spark datediff the results is df.d2 - df.d1

In https://docs.snowflake.com/ko/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.datediff.html

The parameters are processed in different order. and the date part is required.

datediff('day',df.d1,df.d2)

This function is highly used so in spark migration that might imply changes in dozens for files.

the daydiff uses the same parameter order and it is more direct replacement.