Closed valdAsync closed 1 month ago
Hello @valdAsync ,
Thanks for raising the issue. As of now Snowpark dataframe doesnt have the API assertDataFrameEqual, please see if below workaround works for you.
Example Convert the snowpark dataframe to pandas dataframe and use equals method.
`session.sql("alter session set PYTHON_CONNECTOR_QUERY_RESULT_FORMAT=ARROW").collect() df1 = session.table("sampletable1") df2 = session.table("sampletable2")
pandasDF1 = df1.to_pandas() pandasDF2 = df2.to_pandas()
areEqual = pandasDF1.equals(pandasDF2)
print(areEqual)`
Regards, Sujan
Hello @valdAsync ,
An enhancement request has been raised to add support in snowpark python dataframe APIs.
Regards, Sujan
Hello @sfc-gh-sghosh,
Thank you for your comment and the provided workaround, it will come in handy. However, a more straightforward solution would be appreciated.
If you consider this a good first issue, I would like to give it a shot myself.
Hello @valdAsync ,
We are currently addressing the feature request and will keep you updated on its progress in this thread. In the meantime, feel free to utilize the workaround provided above until the feature is fully implemented.
Regards, Sujan
hi @sfc-gh-sghosh I'm looking for this feature to be implemented, do you have any update?
Sorry for the late update - this testing function will be added to next Snowpark release
What is the current behavior?
There is no simple way for a user to compare equality of two DataFrames.
What is the desired behavior?
A simple function for testing equality of two DataFrames.
How would this improve
snowflake-snowpark-python
?Currently, there is no straight way to test for snowflake-snowpark-python DataFrame equality. The new function would assert DataFrame equality and would provide a user-friendly error message in the case of unequal DataFrames.
References, Other Background
PySpark 3.5.0 update https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.testing.assertDataFrameEqual.html