[QST] Is cuDF slow if there are many columns? (pandas, numpy, polars, cuDF Comparison)

Ginger-Tec commented 5 months ago

I conducted performance tests on basic arithmetic operations (+) with two DataFrames of size (4303, 3766) using different libraries: pandas, numpy, polars, and cuDF. Here are the results:

Local Environment (i7-12 Windows)

pandas: 30.4ms
numpy: 30.9ms
polars: 8.18ms
cuDF: 1.18s

Colab Environment

pandas: 50.8ms
numpy: 49.4ms
polars: 84.5ms
cuDF: 1.09s

Analysis of cuDF Performance

Despite the expectation that cuDF (GPU-accelerated) should outperform the other libraries, it showed the slowest performance. Here are the potential reasons for this outcome:

Apache Arrow Columnar Memory Format:
- cuDF operates based on the Apache Arrow columnar memory format, which might introduce overhead in certain scenarios.
Small Row Count:
- The dataset used has only about 4300 rows. GPU acceleration is generally more beneficial for larger datasets, where the overhead of transferring data to the GPU and initializing computations is amortized over a larger number of operations.
Potential Bottlenecks:
- Data Transfer Overhead: The time taken to transfer data from CPU memory to GPU memory can be significant, especially for smaller datasets.
- Insufficient Row Count: With fewer rows, the advantages of parallel processing on the GPU are not fully realized.
- Column-Oriented Processing: The dataset has a high number of columns (3766), and the overhead of managing such a wide dataset may not be effectively offset by the parallelism of the GPU.

Conclusion

Given these factors, it is reasonable to conclude that cuDF may not perform optimally for datasets with a small number of rows and a large number of columns in simple arithmetic operations. The overhead associated with data transfer and initialization on the GPU, combined with the columnar processing model, can outweigh the benefits of GPU acceleration in this specific context.

Is it okay to organize the reason why cudf is slow in the above scenario as above? I think there is something wrong or wrong, so I would really appreciate it if you could let me know.

vyasr commented 5 months ago

Your analysis is correct. Wide dataframes (many columns, few rows) are not what cuDF is optimized for. That is a fundamental property of the Arrow format, and that property is accentuated on GPUs because of the performance characteristics of memory accesses on GPUs relative to CPUs. #14548 has a lot of good discussion on this topic, so I'd have a look there and see if that discussion matches your expectations. Feel free to follow up here if you have more questions.

Ginger-Tec commented 5 months ago

Thank you, @vyasr , for your quick and clear response. Thanks to you, I am confident in introducing cuDF in my presentation today!

rapidsai / cudf