extend GPU memory to run cuDF for medium and big data

h2oai / db-benchmark

reproducible benchmark of database-like ops

https://h2oai.github.io/db-benchmark

Mozilla Public License 2.0

323 stars 85 forks source link

extend GPU memory to run cuDF for medium and big data #97

Closed jangorecki closed 3 years ago

jangorecki commented 5 years ago

Even if possible to fix https://github.com/h2oai/db-benchmark/issues/94 without the need to extend GPU memory we still need more GPU memory for handling 1e9 data (45 GB csv). We need more gpu cards, better gpu cards or a better machine in general. For now we have cuDF results only for 1e7 data.

datametrician commented 4 years ago

I would recommend 2x RTX 8000s. In addition, dask-cuDF would allow you to use both of them vs just using a single 1080 ti as you are doing now.

jangorecki commented 4 years ago

Before trying to move to new hardware I would like to resolve https://github.com/h2oai/db-benchmark/issues/94, so I can be sure that present, and later new hardware, are properly utilized.

jangorecki commented 4 years ago

Assuming required memory scales linearly to data size (and it looks it does), then 2x RTX 8000s will not allow us to compute 1e9 groupby task, as we would need around 220GB for that. But then even a single RTX 8000s will allow us to compute 1e8 groupby, so we wouldn't need to use dask-cudf. As of now using dask-cudf to might not even help to resolve 1e8 using current 2 gpus, as explained in https://github.com/h2oai/db-benchmark/issues/94#issuecomment-564353134

Join task is another thing that we should not forget about, it is more memory demanding so eventually 2x RTX 8000s might be useful to compute 1e8 join.

datametrician commented 4 years ago

I highly recommend moving to RTX 8000 regardless, but Dask-cuDF (as I said in the other issue) allows spilling to system memory.

jangorecki commented 4 years ago

Running medium data size was resolved by spilling data from gpu memory to main memory. Yet it was not enough for a big data case (50 GB), thus I filled new FR for spilling data from main memory to disk memory: https://github.com/rapidsai/cudf/issues/3740 Ultimately we should upgrade GPU cards thus leaving this issue open. Additionally moving to dask-cudf is still on roadmap, for now postponed till cudf documentation will be improved, status of that can be tracked in https://github.com/h2oai/db-benchmark/issues/116

jangorecki commented 4 years ago

Unfortunately we need to fall back to running only 1e7 data size till https://github.com/rapidsai/cudf/issues/2277 will get resolved. This is because of problem with corrupting GPU memory driver described in https://github.com/h2oai/db-benchmark/issues/129 which currently makes us unable to run cudf benchmarks. Due to that cuDF timings are already 1.5 months old.

jangorecki commented 3 years ago

I re-requested spilling to disk, this time using dask_cudf in https://github.com/rapidsai/cudf/issues/3740

jangorecki commented 3 years ago

https://github.com/rapidsai/dask-cuda/issues/37

jangorecki commented 3 years ago

resolved by #219