Closed razajafri closed 4 years ago
So that readers don't have to compile and run your tests just to think about this bug, please print out the values you get in out
for each test and paste them here. Please include what spark expects as well.
Thanks for looking @harrism. I have updated the bug with the test output. I will update it further with the entire output in the AM
It would be a lot easier to understand the issue if the expected/actual behavior were summarized in a table. Can you please fill in the Actual
column?
NaN > x
x | Expected | Actual |
---|---|---|
INFINITY | true | false |
1.02 | true | false |
5.0 | true | false |
NaN | false | false |
-43.2 | true | false |
-INFINITY | true | false |
NaN == x |
x | Expected | Actual |
---|---|---|---|
INFINITY | false | false | |
1.02 | false | false | |
5.0 | false | false | |
NaN | true | false | |
-43.2 | false | false | |
-INFINITY | false | false |
Pandas behaviour in the above cases:
In [8]: x = pd.Series([np.inf, 1.02, 5.0, np.nan, -43.2, -np.inf])
In [9]: np.nan > x
Out[9]:
0 False
1 False
2 False
3 False
4 False
5 False
dtype: bool
In [10]: np.nan == x
Out[10]:
0 False
1 False
2 False
3 False
4 False
5 False
dtype: bool
In [11]: np.nan != x
Out[11]:
0 True
1 True
2 True
3 True
4 True
5 True
dtype: bool
This looks to be inline with the IEE 754 standard.
Seeing as though Spark's behavior is non-conformant with IEEE 754, I'm going to label this as a feature request rather than a bug. Supporting this behavior in libcudf will require adding new binop operators that support the non-conformant behavior, like SPARK_MAX/SPARK_MIN
.
@jrhemstad I have updated the table.
@jrhemstad is this still valid? considering we have decided to abide by the NaN behavior set by IEEE 754
@jrhemstad is this still valid? considering we have decided to abide by the NaN behavior set by IEEE 754
Thanks for the reminder.
Per conversation in https://github.com/rapidsai/cudf/issues/4760, this should be implemented via composition of other libcudf features. Closing.
Describe the bug Spark expects Nans to perform differently from other languages. Currently, cudf isn't behaving how we expect it to behave as per spark
Steps/Code to reproduce bug
The above tests fail with the following error
Here is the output from the Java unit test displaying the entire column output
Expected behavior The above tests should pass
Additional context The above tests are for float32 but similar test should pass for float64