Open jcampabadal-db opened 3 weeks ago
Can you try using the cudf.pandas.profile magic? https://docs.rapids.ai/api/cudf/stable/cudf_pandas/usage/#understanding-performance-the-cudf-pandas-profiler
I think this should tell you which operations are running on the GPU and which are running on CPU.
Thank you @lithomas1, will check that
@lithomas1 I had been working with @jcampabadal-db on this, I observed super slow performance on GPU with following output on both Databricks DBR 13.3 ML(CUDA11.7) and Databricks DBR 14.3 ML(CUDA 11.8) on AWS EC2 g5.xlarge [A10G] following same command from https://docs.rapids.ai/api/cudf/stable/cudf_pandas/usage/#understanding-performance-the-cudf-pandas-profiler
but the output is below (noticed took several minutes), how to workaround or resolve such performance issue?
/databricks/python/lib/python3.10/site-packages/cupy/cuda/compiler.py:233: PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...
jitify._init_module()
Total time elapsed: 225.300 seconds
3 GPU function calls in 224.665 seconds
1 CPU function calls in 0.012 seconds
Stats
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Function ┃ GPU ncalls ┃ GPU cumtime ┃ GPU percall ┃ CPU ncalls ┃ CPU cumtime ┃ CPU percall ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ DataFrame │ 1 │ 0.145 │ 0.145 │ 0 │ 0.000 │ 0.000 │
│ DataFrame.min │ 1 │ 224.520 │ 224.520 │ 0 │ 0.000 │ 0.000 │
│ DataFrame.groupby │ 1 │ 0.000 │ 0.000 │ 0 │ 0.000 │ 0.000 │
│ DataFrameGroupBy.filter │ 0 │ 0.000 │ 0.000 │ 1 │ 0.012 │ 0.012 │
└─────────────────────────┴────────────┴─────────────┴─────────────┴────────────┴─────────────┴─────────────┘
Not all pandas operations ran on the GPU. The following functions required CPU fallback:
Also if I follow https://docs.nvidia.com/spark-rapids/user-guide/23.12/getting-started/databricks.html
sometimes I run into OOM error even loading small dataset:
import cudf
import requests
from io import StringIO
url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode("utf-8")
tips_df = cudf.read_csv(StringIO(content))
MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /__w/cudf/cudf/python/cudf/build/cp310-cp310-linux_x86_64/_deps/rmm-src/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory
Are you using cuDF Pandas alongside Spark RAPIDS in a single application/workflow or is this independent of Spark?
Would be curious to know if you experience this error when following only this guide https://docs.rapids.ai/deployment/stable/platforms/databricks/ (or if it's perhaps related to some combination).
@beckernick thanks for reply on this - actually this ticket was the issues encountered after following this guide pointed above - latest RAPIDS release mandate support of Databricks only till 13.3ML as describedin https://nvidia.github.io/spark-rapids/docs/download.html otherwise Databricks Spark cluster failed to boot up
Describe the bug
cudf-cuda11 is not using GPU while running on a Databricks DBR 13.3 ML LTS with GPU instance.
Steps/Code to reproduce bug
Using DBR 14.3 ML with GPU fails with error:
Internal error message: Spark error: Driver down cause: java.lang.IllegalArgumentException: This RAPIDS Plugin build does not support Spark build 3.5.0-databricks. Supported Spark versions: 3.1.1 {buildver=311}, 3.1.2 {buildver=312}, 3.1.3 {buildver=313}, 3.2.0 {buildver=320}, 3.2.1 {buildver=321}, 3.2.1-cloudera-3.2.7171000 {buildver=321cdh}, 3.2.2 {buildver=322}, 3.2.3 {buildver=323}, 3.2.4 {buildver=324}, 3.3.0 {buildver=330}, 3.3.0-cloudera-3.3.7180 {buildver=330cdh}, 3.3.0-databricks {buildver=330db}, 3.3.1 {buildver=331}, 3.3.2 {buildver=332}, 3.3.2-cloudera-3.3.7190 {buildver=332cdh}, 3.3.2-databricks {buildver=332db}, 3.3.3 {buildver=333}, 3.3.4 {buildver=334}, 3.4.0 {buildver=340}, 3.4.1 {buildver=341}, 3.4.1-databricks {buildver=341db}, 3.4.2 {buildver=342}, 3.5.0 {buildver=350}, 3.5.1 {buildver=351}. Consult the Release documentation at https://nvidia.github.io/spark-rapids/docs/download.html
We are following these guides:
https://docs.rapids.ai/deployment/stable/platforms/databricks/
https://docs.nvidia.com/spark-rapids/user-guide/23.12/getting-started/databricks.html
Expected behavior
For cudf-cuda11 package to utilize GPU to perform pandas operations.
Environment overview (please complete the following information)
Here I load cudf and I made sure it shows <module 'pandas' (ModuleAccelerator(fast=cudf, slow=pandas))> when printing pd.
How to debug why cuDF shows 0 per-gpu usage but only Per-GPU frame buffer utilization bytes? It seems to be only using the CPU. Please advise it seems cudf-cuda11 supports Cuda 11.2+ which the DBR release contains and the library is loaded just fine.
We are using this NVIDIA notebook for testing rapid cudf pandas accelerator:
https://colab.research.google.com/drive/12tCzP94zFG2BRduACucn5Q_OcX1TUKY3