rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.09k stars 874 forks source link

[BUG] dask_cudf breaks with msgpack-python 1.0.0 after RMM conda install #4254

Closed taureandyernv closed 4 years ago

taureandyernv commented 4 years ago

Describe the bug I just installed RMM and it upgraded msgpack-python form 0.62 to 1.0.0. When running on both a 0.11 or 0.12 dask cudf and msgpack 1.0.0, some groupby queries fail with distributed.protocol.core - CRITICAL - Failed to deserialize , ValueError: tuple is not allowed for map key

On a separate system, running 0.13 nightlies package msgpack-python is version 0.6.2, and the query completes fine.

May be similar to the Dask distributed Issue from 8 days ago here: https://github.com/dask/distributed/issues/3491

Error output

distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads
    header = msgpack.loads(header, use_list=False, **msgpack_opts)
  File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
ValueError: tuple is not allowed for map key
distributed.core - ERROR - tuple is not allowed for map key
Traceback (most recent call last):
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/core.py", line 456, in handle_stream
    msgs = await comm.read()
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/tcp.py", line 212, in read
    frames, deserialize=self.deserialize, deserializers=deserializers
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/utils.py", line 69, in from_frames
    res = _from_frames()
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/utils.py", line 55, in _from_frames
    frames, deserialize=deserialize, deserializers=deserializers
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads
    header = msgpack.loads(header, use_list=False, **msgpack_opts)
  File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb

Steps/Code to reproduce bug

import dask_cudf as dcu
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster()
print(cluster)
client = Client(cluster)
client

fn = 'test.csv'
lines = """id3,id4,id5,id6,v1,v2
id0000011793,51,10,59276,1,1
id0000006000,12,58,78315,4,1
id0000012244,25,9,27300,4,5
id0000006000,54,38,65416,2,3
id0000029319,72,92,19046,4,3
id0000068931,87,74,60479,3,2
id0000011793,6,32,90599,4,5
id0000033725,89,85,8657,3,3
id0000006000,12,26,19634,5,2
id0000011793,76,23,38595,5,4
"""
with open(fn, 'w') as fp:
    fp.write(lines)
x = dcu.read_csv(fn,  n_partitions = 2)
x['id3'] = x['id3'].astype('category')

#max v1 - min v2 by id3
ans = x.groupby(['id3']).agg({'v1': 'max', 'v2': 'min'}).compute()
ans['range_v1_v2']= ans['v1'] -ans['v2']

Expected behavior this output of ans

                        v1  v2  range_v1_v2
id3         
id0000006000    5   1   4
id0000011793    5   1   4
id0000012244    4   5   -1
id0000029319    4   3   1
id0000033725    3   3   0
id0000068931    3   2   1

Environment overview (please complete the following information)

Environment details msgpack-python 1.0.0 py36hc9558a2_0 conda-forge <-- possible problem package

Additional context Dask distributed had a recent similar issue 8 days ago, found here: https://github.com/dask/distributed/issues/3491

pentschev commented 4 years ago

@taureandyernv it seems that https://github.com/dask/distributed/pull/3494 fixes this, could you check if doing pip install git+https://github.com/dask/distributed lets you move past this issue?

taureandyernv commented 4 years ago

@pentschev it does. :). Thanks! Should add something to the RMM docs or requirements to fix the incompatibility?

pentschev commented 4 years ago

TBH, I'm not sure what's the procedure for this. Given this seems to affect mostly older releases, is there something we would do to fix it @kkraus14 ?

jakirkham commented 4 years ago

This fix should be in new Dask and Distributed packages for 2.11.0. Did you already try these?

jakirkham commented 4 years ago

Also we are patching old conda-forge packages to correctly constrain this.

Edit: This has been done 🙂

kkraus14 commented 4 years ago

Fixed upstream.