Open quasiben opened 3 years ago
Sounds nice,
Recently I've been running the cugraph benchmarks on Summit and as graph size goes up, there is a lot of these messages:
distributed.core - INFO - Event loop was unresponsive in Worker for Xs. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
so in addition to memory utilization, data transport may also improve.
More specifically, when Dask stores data on a worker, the data could be packed and compressed with nvComp in the
zict
(__setitem__
) and when Dask needs to use that data (__getitem__
) dask-cuda/nvComp will decompress.
I think it is an interesting idea with great potential in highly compressible workflows but I don't think zict
is a good match.
Dask tasks often extract items from collections of device data (e.g. list of dataframes) without actual using the data. For instance, getitem
would force a decompress of all items in the collection even though getitem
itself doesn't really access any of the items.
I think JIT-Unspill and the ProxyObject
is a better match, it should be fairly straightforward to add compression as a new serializer. In this case, we only decompress when a task actual access the data.
@madsbk do we have docs or maybe an example of JIT spilling? Maybe this would help @benjha in the near term :)
@madsbk do we have docs or maybe an example of JIT spilling? Maybe this would help @benjha in the near term :)
Yes, we have some info here: https://docs.rapids.ai/api/dask-cuda/nightly/spilling.html#jit-unspill
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Still active
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
still interested
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Hey @quasiben will the nvcomp contributions to kvikio be of any use here?
Not Ben, but yes.
Yes, is this something you'd be interested in coming back to ? This is on hold for a bit longer while we are exploring better spilling in general: https://github.com/rapidsai/cudf/pull/10746/
@quasiben I'm available to revisit it when it is an important ask. However, I'm quite happy at node-rapids
at this time and am definitely not looking for a different project! :)
I'm mostly just out reviewing my github.com/notifications and being available.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Following the addition of the pack/unpack work where cuDF tables can be stored as a single buffer, we'd like to explore compression of that buffer in hopes of better memory utilization on the device. nvComp is a CUDA library for generic (de-)compression on the GPU. One idea to decrease memory usage is to always store compressed data on the GPU and decompress only when being operated on and/or transferring between workers. More specifically, when Dask stores data on a worker, the data could be packed and compressed with
nvComp
in thezict
(__setitem__
) and when Dask needs to use that data (__getitem__
) dask-cuda/nvComp will decompress.In an ideal world, we'd expect to be store/operate on X more memory where X is the compression ratio. This will of course vary between data dtypes (int, float, string,...). We also probably want to experiment with compression before/after packing -- not sure what the impact will be if any but it's worth noting. Lastly, we don't know what the API will look like but we can use this issue to continue the discussion.
To get things started we will first need python hooks for nvComp.
cc @thomcom