rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.37k stars 894 forks source link

[BUG] Series.equals() raises error with ListColumns, while DataFrame.equals() works #6040

Open miguelusque opened 4 years ago

miguelusque commented 4 years ago

Describe the bug When trying to compare two Series with ListColumns, the following error is displayed:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-6bd0d51bd16c> in <module>
      4 series2 = cudf.Series({1: [10, 20, 30]})
      5 
----> 6 series1.equals(series2)

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/series.py in equals(self, other)
   1499         if other is None or len(self) != len(other):
   1500             return False
-> 1501         return self._binaryop(other, "eq").min()
   1502 
   1503     def ne(self, other, fill_value=None, axis=0):

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/series.py in _binaryop(self, other, fn, fill_value, reflect)
   1081                     rhs = rhs.fillna(fill_value)
   1082 
-> 1083         outcol = lhs._column.binary_operator(fn, rhs, reflect=reflect)
   1084         result = lhs._copy_construct(data=outcol, name=result_name)
   1085         return result

AttributeError: 'ListColumn' object has no attribute 'binary_operator'

Steps/Code to reproduce bug

import cudf

series1 = cudf.Series({1: [10, 20, 30]})
series2 = cudf.Series({1: [10, 20, 30]})

series1.equals(series2)

Expected behaviour No error message. It works with DataFrames instead:

import cudf

df1 = cudf.DataFrame({1: [10, 20, 30], 2: [20, 30, 40]})
df2 = cudf.DataFrame({1: [10, 20, 30], 2: [20, 30, 40]})

df1.equals(df2)

Environment overview (please complete the following information) DGX1

Environment details cuDF version: 0.15.0a+4666.g1778921b0

shwina commented 4 years ago

@miguelusque Thanks for reporting. Binary operations are not yet implemented at the libcudf level for ListColumns. You should be seeing the same error for DataFrames as well:

In [8]: a = cudf.DataFrame({'a': [[1, 2], [3, 4]]}); b = cudf.DataFrame({'a': [[1, 2], [4, 5]]})

In [9]: a
Out[9]:
        a
0  [1, 2]
1  [3, 4]

In [10]: a == b  # will error
kkraus14 commented 4 years ago

@shwina Series.equals returns a scalar boolean, it doesn't necessarily have to go through binaryop machinery though I suspect it does currently.

kkraus14 commented 3 years ago

pushing to 0.18

jrhemstad commented 3 years ago

This is labeled as a libcudf issue, but I don't believe that is correct.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.