Open csadorf opened 1 year ago
@karthikeyann Looks like libcudf NULL_EQUALS
binary op isn't accounting for np.nan
values
I don't think this is a bug in libcudf (typically, NaNs do not compare equal). Possibly something we need to find a solution for in Python land.
The libcudf row-comparator has specialized supported to report equality when comparing NaNs:
https://github.com/rapidsai/cudf/blob/5f83a8491603eadc1de5cb174016801c1cca5824/cpp/include/cudf/table/row_operators.cuh#L138-L143
I think we can make this consistent at least for NULL_EQUALS
https://github.com/rapidsai/cudf/blob/5f83a8491603eadc1de5cb174016801c1cca5824/cpp/src/binaryop/compiled/operation.cuh#L387-L400
I made a quick change to NullEquals
in https://github.com/rapidsai/cudf/pull/12275 for discussion purposes and in case you want to try it out.
@wence- is #15731 relevant here?
In this sense that it was adding another specialised binop equality comparator, yes. But using it wouldn't fix this problem.
If we want to carve out new NANS_LIKE_NULL_EQUALS
/NANS_LIKE_NULL_NOT_EQUALS
binops, we could do so. Otherwise we could support the requirements of DataFrame.equals
by calling nans_to_nulls
on the columns first and then comparing with NULL_EQUALS
.
If we don't want to add to libcudf, we could have a PTX-jitted binop for this that we pass in via GENERIC_PTX
or whatever it's called, I think.
Describe the bug
According to the documentation, the result of
DataFrame.equals()
should beTrue
when the two dataframes are equal, even when they contain NaNs:However, this does not seem to be the case under certain circumstances.
Steps/Code to reproduce bug
The following snippet demonstrates the unexpected behavior:
will fail with an
AssertionError
.Curiously the following snippet does not fail:
Expected behavior
The result of
DataFrame.equals()
should be True in case that values are the same and NaNs are in the same place as described by the method's documentation.Environment overview (please complete the following information)
Environment details
Click here to see environment details