pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.72k stars 1.69k forks source link

Add Pandas `.compare()` Functionality #14373

Open ryantaylor406 opened 4 months ago

ryantaylor406 commented 4 months ago

Description

Pandas has a very nifty .compare() function that compares two dataframes and returns the rows/columns where there are differences. This is incredibly helpful when debugging, testing new code, etc.

It would be incredible to have this functionality built into polars as it is quite slow in pandas with medium/large datasets.

avimallu commented 4 months ago

Does assert_frame_equal help?

ryantaylor406 commented 4 months ago

Does assert_frame_equal help?

The compare() function returns a new dataframe that shows where the rows & columns of two dataframes are different. The assert_frame_equal function only returns a boolean if the two dataframes are equal or not. So they're related but not the same (the compare function is more useful!)

fdosani commented 1 month ago

@ryantaylor406 Late to the party. Not sure if this would be helpful but I maintain a library which allows you to compare to Polars DataFrames and get a report back. https://capitalone.github.io/datacompy/polars_usage.html