rapidsai / dask-cuda

Utilities for Dask and CUDA interactions
https://docs.rapids.ai/api/dask-cuda/stable/
Apache License 2.0
285 stars 91 forks source link

Assertion Error when creating Cluster and Client #499

Closed 9849842 closed 3 years ago

9849842 commented 3 years ago

I am using Rapidsai 0.15 and python 3.7.6

Here is the code I am using to create the client and cluster;

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster(n_workers=5, threads_per_worker=1)
client = Client(cluster)
client

and here is the error after I run that cell:

distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
distributed.nanny - ERROR - Failed while trying to start worker process: 
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x7fbf703b2690>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=AssertionError()>)
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/deploy/spec.py", line 355, in _correct_state_internal
    await w  # for tornado gen.coroutine support
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
distributed.nanny - ERROR - Failed while trying to start worker process: 
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py:623> exception=AssertionError()>
Traceback (most recent call last):
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
    response = await self.instantiate()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
    result = await self.process.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 581, in start
    msg = await self._wait_until_connected(uid)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 696, in _wait_until_connected
    raise msg["exception"]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py", line 766, in run
    await worker
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py", line 275, in _
    await self.start()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 1155, in start
    await self._register_with_scheduler()
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py", line 871, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 141, in __getitem__
    return self.host_buffer[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 569, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py", line 51, in merge_frames
    assert sum(lengths) == sum(map(nbytes, frames))
AssertionError
distributed.nanny - ERROR - Failed while trying to start worker process: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-808055d8c35d> in <module>
----> 1 cluster = LocalCUDACluster(n_workers=6, threads_per_worker=1)
      2 client = Client(cluster)
      3 client

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/local_cuda_cluster.py in __init__(self, n_workers, threads_per_worker, processes, memory_limit, device_memory_limit, CUDA_VISIBLE_DEVICES, data, local_directory, protocol, enable_tcp_over_ucx, enable_infiniband, enable_nvlink, enable_rdmacm, ucx_net_devices, rmm_pool_size, rmm_managed_memory, **kwargs)
    271         self.cuda_visible_devices = CUDA_VISIBLE_DEVICES
    272         self.scale(n_workers)
--> 273         self.sync(self._correct_state)
    274 
    275     def new_worker_spec(self):

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    181             return future
    182         else:
--> 183             return sync(self.loop, func, *args, **kwargs)
    184 
    185     def _log(self, log):

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    338     if error[0]:
    339         typ, exc, tb = error[0]
--> 340         raise exc.with_traceback(tb)
    341     else:
    342         return result[0]

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/utils.py in f()
    322             if callback_timeout is not None:
    323                 future = asyncio.wait_for(future, callback_timeout)
--> 324             result[0] = yield future
    325         except Exception as exc:
    326             error[0] = sys.exc_info()

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/tornado/gen.py in run(self)
    760 
    761                     try:
--> 762                         value = future.result()
    763                     except Exception:
    764                         exc_info = sys.exc_info()

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/deploy/spec.py in _correct_state_internal(self)
    353                 for w in workers:
    354                     w._cluster = weakref.ref(self)
--> 355                     await w  # for tornado gen.coroutine support
    356             self.workers.update(dict(zip(to_open, workers)))
    357 

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py in _()
    273                         )
    274                 else:
--> 275                     await self.start()
    276                     self.status = Status.running
    277             return self

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py in start(self)
    293 
    294         logger.info("        Start Nanny at: %r", self.address)
--> 295         response = await self.instantiate()
    296         if response == Status.running:
    297             assert self.worker_address

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py in instantiate(self, comm)
    376 
    377         else:
--> 378             result = await self.process.start()
    379         return result
    380 

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py in start(self)
    579             return
    580 
--> 581         msg = await self._wait_until_connected(uid)
    582         if not msg:
    583             return self.status

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py in _wait_until_connected(self, uid)
    694                 )
    695                 await self.process.join()
--> 696                 raise msg["exception"]
    697             else:
    698                 return msg

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/nanny.py in run()
    764             """
    765             try:
--> 766                 await worker
    767             except Exception as e:
    768                 logger.exception("Failed to start worker")

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/core.py in _()
    273                         )
    274                 else:
--> 275                     await self.start()
    276                     self.status = Status.running
    277             return self

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py in start()
   1153         self._pending_plugins = ()
   1154 
-> 1155         await self._register_with_scheduler()
   1156 
   1157         self.start_periodic_callbacks()

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py in _register_with_scheduler()
    869                         name=self.name,
    870                         nbytes={ts.key: ts.get_nbytes() for ts in self.tasks.values()},
--> 871                         types={k: typename(v) for k, v in self.data.items()},
    872                         now=time(),
    873                         resources=self.total_resources,

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/worker.py in <dictcomp>()
    869                         name=self.name,
    870                         nbytes={ts.key: ts.get_nbytes() for ts in self.tasks.values()},
--> 871                         types={k: typename(v) for k, v in self.data.items()},
    872                         now=time(),
    873                         resources=self.total_resources,

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/_collections_abc.py in __iter__()
    742     def __iter__(self):
    743         for key in self._mapping:
--> 744             yield (key, self._mapping[key])
    745 
    746 ItemsView.register(dict_items)

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/dask_cuda/device_host_file.py in __getitem__()
    139             return self.device_buffer[key]
    140         elif key in self.host_buffer:
--> 141             return self.host_buffer[key]
    142         else:
    143             raise KeyError(key)

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py in __getitem__()
     76             return self.fast[key]
     77         elif key in self.slow:
---> 78             return self.slow_to_fast(key)
     79         else:
     80             raise KeyError(key)

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/buffer.py in slow_to_fast()
     63 
     64     def slow_to_fast(self, key):
---> 65         value = self.slow[key]
     66         # Avoid useless movement for heavy values
     67         if self.weight(key, value) <= self.n:

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/zict/func.py in __getitem__()
     36 
     37     def __getitem__(self, key):
---> 38         return self.load(self.d[key])
     39 
     40     def __setitem__(self, key, value):

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/serialize.py in deserialize_bytes()
    567         header = {}
    568     frames = decompress(header, frames)
--> 569     frames = merge_frames(header, frames)
    570     return deserialize(header, frames)
    571 

/usr/local/share/anaconda3/envs/rapidsai/lib/python3.7/site-packages/distributed/protocol/utils.py in merge_frames()
     49 
     50     assert len(lengths) == len(writeables)
---> 51     assert sum(lengths) == sum(map(nbytes, frames))
     52 
     53     if all(len(f) == l for f, l in zip(frames, lengths)):

AssertionError: 

Here is my GPU setup:

Sat Jan 23 13:45:53 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:00:06.0 Off |                  N/A |
|  0%   15C    P8     8W / 250W |    145MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:00:08.0 Off |                  N/A |
|  0%   16C    P8     8W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:00:0A.0 Off |                  N/A |
|  0%   17C    P8     8W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:00:0C.0 Off |                  N/A |
|  0%   14C    P8     8W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 108...  Off  | 00000000:00:0E.0 Off |                  N/A |
|  0%   15C    P8     8W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 108...  Off  | 00000000:00:10.0 Off |                  N/A |
|  0%   18C    P8     8W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     10356      C   ...hare/anaconda3/envs/rapidsai/bin/python   135MiB |
+-----------------------------------------------------------------------------+
pentschev commented 3 years ago

This is probably due to recent changes in Dask/Distributed. I would suggest trying a newer version of RAPIDS if possible, the current stable version is now 0.17. If for some reason you can't upgrade, what you could try is to downgrade Dask and Distributed packages to versions 2.24.0, as your install is probably picking up a much newer version of those as we don't pin dask-cuda to any specific version of those packages.

Keep in mind that rolling back to an older version of Dask and Distributed packages may require also older versions of different packages too, and I can't ensure that won't be problematic with Dask then. The best solution would indeed be that you upgrade to RAPIDS 0.17.

9849842 commented 3 years ago

I have upgraded to RAPIDS 0.17, now I'm getting this:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3a2c34e881e6> in <module>
----> 1 from dask_cuda import LocalCUDACluster
      2 from dask.distributed import Client
      3 from dask import array as da
      4 from dask import dataframe as dd
      5 import xgboost as xgb

ModuleNotFoundError: No module named 'dask_cuda'
pentschev commented 3 years ago

Your installation is missing dask-cuda, are you installing RAPIDS from conda or some other way? If you're installing from conda the rapids=0.17 metapackage (see https://rapids.ai/start.html) includes dask-cuda, but if you're only installing a selection of packages (e.g., cuDF only), then you need to install the dask-cuda=0.17 package as well.

9849842 commented 3 years ago

I am installing from Conda, so I shouldn't be getting that error.

pentschev commented 3 years ago

Then please make sure you install the rapids=0.17 metapackage (includes dask-cuda), if you're installing only a subset of that, then install also dask-cuda=0.17.

9849842 commented 3 years ago

I am using the RAPIDS AMI on EC2

quasiben commented 3 years ago

What is the AMI number you are using ?

Hmm, actually, I would recommend you follow the instructions here: https://rapids.ai/cloud#AWS-EC2 for launching RAPIDS on EC2

pentschev commented 3 years ago

This has gone stale, closing. Please feel free to reopen if there's more to be discussed here.