Closed ChrisJar closed 3 years ago
Thanks @ChrisJar I can also reproduce. Here is perhaps a simpler reproducer:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster(n_workers=1, jit_unspill=True)
client = Client(cluster)
import cudf, dask_cudf
from dask_sql import Context
c = Context()
df = cudf.DataFrame({"id":[1,4,4,5,3], "val":[4,6,6,3,8]})
ddf = dask_cudf.from_cudf(df, npartitions=1)
c.create_table("df", ddf)
query = "SELECT * FROM df ORDER BY id desc"
c.sql(query).compute()
Seeing _percentile
come up in the traceback, I'm wondering if we need to add additional dispatches in Dask. @galipremsagar do you have thoughts here ?
@galipremsagar do you have thoughts here ?
Looking into it
c.sql(query).compute()
is what I want to know from dask-cuda
experts, the reason I'm asking is a FrameProxyObject
which is a proxy object that acts as a pass-through to Frame
like objects will be returned:>>> from dask_cuda import LocalCUDACluster
>>> from dask.distributed import Client
>>> cluster = LocalCUDACluster(n_workers=1, jit_unspill=True)
>>> client = Client(cluster)
>>> import cudf, dask_cudf
>>> from dask_sql import Context
>>> c = Context()
>>> df = cudf.DataFrame({"id":[1,4,4,5,3], "val":[4,6,6,3,8]})
>>> ddf = dask_cudf.from_cudf(df, npartitions=1)
>>> c.create_table("df", ddf)
>>> query = "SELECT * FROM df ORDER BY id desc"
>>> c.sql(query).compute()
>>> x = c.sql(query).compute()
>>> x.to_pandas()
id val
4 5 3
3 4 6
2 4 6
1 3 8
0 1 4
>>> type(x)
<class 'dask_cuda.proxify_device_objects.FrameProxyObject'>
716 contains the fix to this issue. But what's the expected result type for
c.sql(query).compute()
is what I want to know fromdask-cuda
experts, the reason I'm asking is aFrameProxyObject
which is a proxy object that acts as a pass-through toFrame
like objects will be returned:
Yes, that is as expected. The proxy objects leaks into userspace, unless setting DASK_JIT_UNSPILL_COMPATIBILITY_MODE=True
.
I get an unexpected error when performing the
ORDER BY desc
operation when using dask-sql with a dask-cuda cluster with JIT unspilling enabled.For example:
returns:
Environment: dask - 2021.8.1 dask-sql - 0.3.10 cudf - 21.10 dask-cudf - 21.10 dask-cuda - 21.10