Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
I would like to have df.compare accept "tolerance" thresholds to allow for approximate comparisons. This feature already exists in the assert_frame_equal utility, and it would be beneficial in compare to help users identify the rows and columns that are causing their assertion to fail. It would be helpful in many cases to allow users to filter out differences that are sufficiently small.
Feature Description
def compare(
self,
other: DataFrame,
align_axis: Axis = 1,
keep_shape: bool = False,
keep_equal: bool = False,
rtol = None,
atol = None,
) -> DataFrame:
"""
...
rtol: float | None, default None
Relative tolerance. Numeric differences below this value will not be considered differences for the purposes of "keep_shape" and will be shown as NaN if "keep_equal" is False.
atol: float | None, default None
Absolute tolerance. Numeric differences below this value will not be considered differences for the purposes of "keep_shape" and will be shown as NaN if "keep_equal" is False.
For implementation, the current comparison is essentially the following check: mask = ~((self == other) | (self.isna() & other.isna())). From a quick glance of _testing.assert_almost_equal, it appears we could implement it by calling that function iteratively with each item of the DataFrame, though I'm not sure if it's okay to reference the _testing library outside of testing functions.
Alternative Solutions
Could also be implemented more directly with math.isclose function calls, but this would need to be applied only to numeric columns.
Feature Type
[ ] Adding new functionality to pandas
[X] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
I would like to have
df.compare
accept "tolerance" thresholds to allow for approximate comparisons. This feature already exists in theassert_frame_equal
utility, and it would be beneficial in compare to help users identify the rows and columns that are causing their assertion to fail. It would be helpful in many cases to allow users to filter out differences that are sufficiently small.Feature Description
For implementation, the current comparison is essentially the following check:
mask = ~((self == other) | (self.isna() & other.isna()))
. From a quick glance of_testing.assert_almost_equal
, it appears we could implement it by calling that function iteratively with each item of the DataFrame, though I'm not sure if it's okay to reference the_testing
library outside of testing functions.Alternative Solutions
Could also be implemented more directly with
math.isclose
function calls, but this would need to be applied only to numeric columns.Additional Context
No response