I spotted a blog post from Coiled folks about the One Billion Row Challenge and thought I'd have a go at reproducing and adding some GPU metrics on my workstation. Pandas, Dask and Polars all performed as expected and dask-cudf managed to surpass them all. cudf ran into some string limitations and had some memory challenges because my GPU memory wasn't big enough to fit the whole dataset in and do the groupby.
This might make a nice example notebook for single-node Dask deployments because the code is easy to understand but it runs into come cudf limitations and needs dask-cudf. But when you use cudf with Dask you get best in class performance.
I spotted a blog post from Coiled folks about the One Billion Row Challenge and thought I'd have a go at reproducing and adding some GPU metrics on my workstation. Pandas, Dask and Polars all performed as expected and dask-cudf managed to surpass them all. cudf ran into some string limitations and had some memory challenges because my GPU memory wasn't big enough to fit the whole dataset in and do the groupby.
This might make a nice example notebook for single-node Dask deployments because the code is easy to understand but it runs into come cudf limitations and needs dask-cudf. But when you use cudf with Dask you get best in class performance.