Closed mrocklin closed 5 years ago
Actually, that was on an old cudf build. Here is an exception that I get before I eventually get Aborted
failures
left_nrows = 5, right_nrows = 5, left_nkeys = 4, right_nkeys = 4
@pytest.mark.parametrize("left_nrows", param_nrows)
@pytest.mark.parametrize("right_nrows", param_nrows)
@pytest.mark.parametrize("left_nkeys", [4, 5])
@pytest.mark.parametrize("right_nkeys", [4, 5])
def test_join_inner(left_nrows, right_nrows, left_nkeys, right_nkeys):
chunksize = 50
np.random.seed(0)
# cuDF
left = gd.DataFrame(
{
"x": np.random.randint(0, left_nkeys, size=left_nrows),
"a": np.arange(left_nrows),
}.items()
)
right = gd.DataFrame(
{
"x": np.random.randint(0, right_nkeys, size=right_nrows),
"a": 1000 * np.arange(right_nrows),
}.items()
)
expect = left.set_index("x").join(
right.set_index("x"), how="inner", sort=True, lsuffix="l", rsuffix="r"
)
expect = expect.to_pandas()
# dask_cudf
left = dgd.from_cudf(left, chunksize=chunksize)
right = dgd.from_cudf(right, chunksize=chunksize)
joined = left.set_index("x").join(
> right.set_index("x"), how="inner", lsuffix="l", rsuffix="r"
)
dask_cudf/tests/test_join.py:46:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask_cudf/core.py:339: in join
meta = self._meta.join(other._meta, how=how, lsuffix=lsuffix, rsuffix=rsuffix)
../cudf/python/cudf/dataframe/dataframe.py:1234: in join
rsuffix=rsuffix, method=method)
../cudf/python/cudf/dataframe/dataframe.py:1052: in merge
method=method)
cudf/bindings/join.pyx:26: in cudf.bindings.join.join
???
cudf/bindings/join.pyx:122: in cudf.bindings.join.join
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E cudf.bindings.GDFError.GDFError: CUDA ERROR. b'cudaErrorInvalidDevicePointer': b'invalid device pointer'
cc @dantegd looks like this could be at the Cython layer
It's an issue with merging on empty dataframes
import cudf
df = cudf.DataFrame({'x': []})
df.merge(df, on=['x'])
ERROR: CUDA Runtime call cudaPeekAtLastError() in line 566 of file /home/nfs/mrocklin/cudf/cpp/src/join/joining.cu failed with invalid device pointer (17).
---------------------------------------------------------------------------
GDFError Traceback (most recent call last)
<ipython-input-3-a5f3db0f3305> in <module>
----> 1 df.merge(df, on=['x'])
~/cudf/python/cudf/dataframe/dataframe.py in merge(self, other, on, how, lsuffix, rsuffix, type, method)
1050
1051 cols, valids = cpp_join.join(lhs._cols, rhs._cols, on, how,
-> 1052 method=method)
1053
1054 df = DataFrame()
~/cudf/python/cudf/bindings/join.pyx in cudf.bindings.join.join()
~/cudf/python/cudf/bindings/join.pyx in cudf.bindings.join.join()
~/cudf/python/cudf/bindings/cudf_cpp.pyx in cudf.bindings.cudf_cpp.check_gdf_error()
GDFError: CUDA ERROR. b'cudaErrorInvalidDevicePointer': b'invalid device pointer'
Happy to move this to cudf if desired
(dask does this in order to get the dtypes and such for the output dataframe without doing any work)
Was about to ask if this happens in the actual merge or the meta calculation. Could you raise an issue in cuDF about handling empty dataframes in merges? Thanks!
Will do. Sorry I didn't dive into this earlier.
No worries, I think the bigger thing here is our error messages are cryptic at best 😅
@mrocklin can you confirm that https://github.com/rapidsai/cudf/pull/691 fixes this issue?
Honestly I haven't yet set up a nice build process on my machine yet, so I may be slow to test this (also playing catch-up today). This is on my radar and something that is high priority for me, but I don't recommend blocking on my engagement here. If that PR fixes the cudf issue then I encourage you all to merge it.
I can raise more issues if the problem persists.
Will merge the PR and close this issue once CI reports green, if there's subsequent issues lets open new issues to track them. Thanks!
This seems to be resolved. There are other join issues coming up that I'll discuss in https://github.com/rapidsai/dask-cudf/pull/67
Currently our joins segfault. This is probably most easily reproduced by running the current test suite.
I apply this patch to un-skip the tests
Then I run tests