Open adrian17 opened 3 weeks ago
take
Thanks for the request!
Compared to DataFrame methods, what makes assert_frame_equal
unique in that it should support by-column arguments?
It does not seem to me to be maintainable to allow by-column specific arguments across the API for DataFrame methods, and therefore we should not do so here for API consistency. The alternative solution in the OP appears to me to be the right, sustainable, approach.
Hi @rhshadrach. Thanks for the comment. I have done some work on this and I think the solution I've come up with is sustainable going forward. Just ironing out a few details. I got a few tests failing but they appear irrelevant to 'assert_frame_equal'. I should be ready to open a PR this week. Perhaps we can discuss if the solution compatible with the API on the PR review section?
@specialkapa - without an answer to the above question, I am opposed to adding this feature. The issue I have with sustainability is not for this one particular feature, but rather having to add similar things to other methods for DataFrames.
That is a good point. Thanks for getting back to me.
On Sun, 25 Aug 2024 at 16:30, Richard Shadrach @.***> wrote:
@specialkapa https://github.com/specialkapa - without an answer to the above question, I am opposed to adding this feature. The issue I have with sustainability is not for this one particular feature, but rather having to add similar things to other methods for DataFrames.
— Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/59548#issuecomment-2308896936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW7JCDIJBMYKVBH5VJY2IK3ZTH2DFAVCNFSM6AAAAABMX4CA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYHA4TMOJTGY . You are receiving this because you were mentioned.Message ID: @.***>
Feature Type
[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
Our internal validation tool's tolerance needs to depend on compared metrics. For example, when obtaining results from an analytical database from a query like
We expect
device_count
to always be accurate, butscore
is expected to have random numerical floating point inaccuracies.My old code ran
assert_frame_equal
several times on different subsets of columns, which is cumbersome and doesn't express the intent well. I recently refactored it by extractingassert_frame_equal
's implementation and just adding the extra arguments to support per-column customizablertol
andatol
. It would be nice if such an ability was built into Pandas.Note that this overlaps a bit with feature request https://github.com/pandas-dev/pandas/issues/54861 .
Feature Description
One way is to add extra arguments to
assert_frame_equal
, usable like so:Or the entire comparison configuration (
check_exact
,check_datetimelike_compat
etc) could be overridden per-series, for exampleAlternative Solutions
The current way to do it with public APIs is to do something like