Open sfc-gh-mrojas opened 1 year ago
Could you provide an example that shows the difference between the two implementations?
Sure.
df = spark.createDataFrame([('2015-04-08','2015-05-10')], ['d1', 'd2'])
df.select(datediff(df.d2, df.d1).alias('diff')).collect()
[Row(diff=32)]
In spark datediff
the results is df.d2 - df.d1
The parameters are processed in different order. and the date part is required.
datediff('day',df.d1,df.d2)
This function is highly used so in spark migration that might imply changes in dozens for files.
the daydiff
uses the same parameter order and it is more direct replacement.
What is the current behavior?
The snowflake datediff has been built to match the snowflake platform https://docs.snowflake.com/en/sql-reference/functions/datediff
The spark datediff is different from snowflake datediff this can imply some manual changes. Spark diff is more of a diference in number of dates and the order or parameters is different.
What is the desired behavior?
A new method ‘daydiff’ is provided and this method provides higher compatibility with the previous method and spark make it a simple replacement.
How would this improve
snowflake-snowpark-python
?It accelerates and simplifies the transition for people coming from spark.
References, Other Background