rapidsai / deployment

RAPIDS Deployment Documentation
https://docs.rapids.ai/deployment/stable/
9 stars 28 forks source link

Add One Billion Row Challenge example notebook #319

Closed jacobtomlinson closed 7 months ago

jacobtomlinson commented 7 months ago

I spotted a blog post from Coiled folks about the One Billion Row Challenge and thought I'd have a go at reproducing and adding some GPU metrics on my workstation. Pandas, Dask and Polars all performed as expected and dask-cudf managed to surpass them all. cudf ran into some string limitations and had some memory challenges because my GPU memory wasn't big enough to fit the whole dataset in and do the groupby.

This might make a nice example notebook for single-node Dask deployments because the code is easy to understand but it runs into come cudf limitations and needs dask-cudf. But when you use cudf with Dask you get best in class performance.