h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
322 stars 85 forks source link

use dask-cudf to utilize multiple GPUs #116

Closed jangorecki closed 3 years ago

jangorecki commented 4 years ago

cudf uses only single GPU, thus it would be useful to employ dask-cudf rather than just cudf. https://blog.dask.org/2019/01/29/cudf-joins

jangorecki commented 4 years ago

this can unfortunately cause other issues, for example https://github.com/rapidsai/cudf/issues/3363#issuecomment-562159646

jangorecki commented 4 years ago

As pointed out by @datametrician we should be also able to use off-vmemory data storage with dask-cudf, then it even make sense to use dask-cudf for a single GPU.

jangorecki commented 4 years ago

waiting for https://github.com/rapidsai/cudf/issues/2288

jangorecki commented 4 years ago

Dask seems to be a not mandatory for spilling to main memory. Due to poorly documented setup of dask-cudf this part will be solved separately: https://github.com/h2oai/db-benchmark/issues/129

datametrician commented 4 years ago

Without Dask, you still only use 1 GPU instead of both of them.

jangorecki commented 4 years ago

@datametrician yes, I am aware of it, so the plan is to move to dask-cudf, so this issue stays open.

jangorecki commented 4 years ago

using dask-cudf will additionally allows to attempt 1e9 data size by using spil to disk memory feature, as explained in https://github.com/rapidsai/cudf/issues/3740#issuecomment-573091892

jangorecki commented 3 years ago

https://github.com/rapidsai/cudf/issues/2288 has been finally resolved and it looks we can proceed to using dask_cudf to utilize both GPUs