rapidsai / dask-cuda

Utilities for Dask and CUDA interactions
https://docs.rapids.ai/api/dask-cuda/stable/
Apache License 2.0
292 stars 93 forks source link

nvComp compression with Pack/Unpack #760

Open quasiben opened 3 years ago

quasiben commented 3 years ago

Following the addition of the pack/unpack work where cuDF tables can be stored as a single buffer, we'd like to explore compression of that buffer in hopes of better memory utilization on the device. nvComp is a CUDA library for generic (de-)compression on the GPU. One idea to decrease memory usage is to always store compressed data on the GPU and decompress only when being operated on and/or transferring between workers. More specifically, when Dask stores data on a worker, the data could be packed and compressed with nvComp in the zict (__setitem__) and when Dask needs to use that data (__getitem__) dask-cuda/nvComp will decompress.

In an ideal world, we'd expect to be store/operate on X more memory where X is the compression ratio. This will of course vary between data dtypes (int, float, string,...). We also probably want to experiment with compression before/after packing -- not sure what the impact will be if any but it's worth noting. Lastly, we don't know what the API will look like but we can use this issue to continue the discussion.

To get things started we will first need python hooks for nvComp.

cc @thomcom

benjha commented 3 years ago

Sounds nice,

Recently I've been running the cugraph benchmarks on Summit and as graph size goes up, there is a lot of these messages:

distributed.core - INFO - Event loop was unresponsive in Worker for Xs.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.

so in addition to memory utilization, data transport may also improve.

madsbk commented 3 years ago

More specifically, when Dask stores data on a worker, the data could be packed and compressed with nvComp in the zict (__setitem__) and when Dask needs to use that data (__getitem__) dask-cuda/nvComp will decompress.

I think it is an interesting idea with great potential in highly compressible workflows but I don't think zict is a good match. Dask tasks often extract items from collections of device data (e.g. list of dataframes) without actual using the data. For instance, getitem would force a decompress of all items in the collection even though getitem itself doesn't really access any of the items.

I think JIT-Unspill and the ProxyObject is a better match, it should be fairly straightforward to add compression as a new serializer. In this case, we only decompress when a task actual access the data.

jakirkham commented 3 years ago

@madsbk do we have docs or maybe an example of JIT spilling? Maybe this would help @benjha in the near term :)

madsbk commented 3 years ago

@madsbk do we have docs or maybe an example of JIT spilling? Maybe this would help @benjha in the near term :)

Yes, we have some info here: https://docs.rapids.ai/api/dask-cuda/nightly/spilling.html#jit-unspill

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

quasiben commented 2 years ago

Still active

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

quasiben commented 2 years ago

still interested

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

thomcom commented 2 years ago

Hey @quasiben will the nvcomp contributions to kvikio be of any use here?

jakirkham commented 2 years ago

Not Ben, but yes.

quasiben commented 2 years ago

Yes, is this something you'd be interested in coming back to ? This is on hold for a bit longer while we are exploring better spilling in general: https://github.com/rapidsai/cudf/pull/10746/

thomcom commented 2 years ago

@quasiben I'm available to revisit it when it is an important ask. However, I'm quite happy at node-rapids at this time and am definitely not looking for a different project! :)

thomcom commented 2 years ago

I'm mostly just out reviewing my github.com/notifications and being available.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.