snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
255 stars 106 forks source link

SNOW-1247349: Add a function to test for DataFrames equality #1320

Closed valdAsync closed 1 month ago

valdAsync commented 5 months ago

What is the current behavior?

There is no simple way for a user to compare equality of two DataFrames.

What is the desired behavior?

A simple function for testing equality of two DataFrames.

How would this improve snowflake-snowpark-python?

Currently, there is no straight way to test for snowflake-snowpark-python DataFrame equality. The new function would assert DataFrame equality and would provide a user-friendly error message in the case of unequal DataFrames.

References, Other Background

PySpark 3.5.0 update https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.testing.assertDataFrameEqual.html

sfc-gh-sghosh commented 5 months ago

Hello @valdAsync ,

Thanks for raising the issue. As of now Snowpark dataframe doesnt have the API assertDataFrameEqual, please see if below workaround works for you.

Example Convert the snowpark dataframe to pandas dataframe and use equals method.

`session.sql("alter session set PYTHON_CONNECTOR_QUERY_RESULT_FORMAT=ARROW").collect() df1 = session.table("sampletable1") df2 = session.table("sampletable2")

pandasDF1 = df1.to_pandas() pandasDF2 = df2.to_pandas()

areEqual = pandasDF1.equals(pandasDF2)

print(areEqual)`

Regards, Sujan

sfc-gh-sghosh commented 5 months ago

Hello @valdAsync ,

An enhancement request has been raised to add support in snowpark python dataframe APIs.

Regards, Sujan

valdAsync commented 5 months ago

Hello @sfc-gh-sghosh,

Thank you for your comment and the provided workaround, it will come in handy. However, a more straightforward solution would be appreciated.

If you consider this a good first issue, I would like to give it a shot myself.

sfc-gh-sghosh commented 5 months ago

Hello @valdAsync ,

We are currently addressing the feature request and will keep you updated on its progress in this thread. In the meantime, feel free to utilize the workaround provided above until the feature is fully implemented.

Regards, Sujan

duongleh commented 1 month ago

hi @sfc-gh-sghosh I'm looking for this feature to be implemented, do you have any update?

sfc-gh-jdu commented 1 month ago

Sorry for the late update - this testing function will be added to next Snowpark release