Open cfangplus opened 4 years ago
I note the assembly-no-scala.xml file under the project and find the scala library is excluded, but why?
Hi cfangplus,
What Spark version were you using? The examples here are not compatible with the latest Spark-3.0.0-preview yet. If you run into the same issue on Spark 2.x, please provide your environment details and more logs so I could try reproducing this issue.
As for the Scala library, I think it's already provided by Spark runtime.
yea, I noticed that. I used spark-submit to submit the application to a spark3.0 cluster and the scala library conflicted, do you know how to fix or how to enable this scala project compatible wih spark3.0?
I know this project is developed since June 2019 and at that time the cuda version is 10.1. Now I have a gpu environment with cuda10.2, how could I get the cudf-0.9.2-cuda10-2.jar and libxgboost4j.so with cuda10.2 support?
The team is planning to support CUDA 10.2.
As for now, you could install both CUDA 10.1 & 10.2 on your server and run these examples with CUDA 10.1: https://stackoverflow.com/questions/41330798/install-multiple-versions-of-cuda-and-cudnn
That's great. Now I have another question. Why does this spark-example project be proposed? As we konw that, NV Rapids + Dask could provide distributed data processing, machine learning and graph computing, so Apache Spark seems does not been needed, right?
I think Joshua answered your question here: https://github.com/rapidsai/cudf/issues/3643
Also as specified by the README.md, this repo provides docs and example applications that demonstrate the RAPIDS.ai GPU-accelerated XGBoost-Spark project.
@chuanlihao Thank you for your reply. Recently I runed this program with cuda10.1 and the result is good. As we know, the kernel module of xgboost is writen by c/c++ and provide shared library to python, JVM, R and other languages API. So does cuML has the capacity that could be used by Spark via JNI ? Do you have some similar idea or plan?
Currently cuML only have Python binding. Technically we could apply a similar approach as in XGboost with cuML. Please let us know your use cases for cuML on Spark.
@anfeng Thanks,I mean we want to accelerate our Spark ML/Graph applications with gpu.
I followed the steps that is illustrated in the page of https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/building-sample-apps/scala.md and built the scala project. Then I used spark-submit to submit the application to the cluster and I got an exception called 'java.lang.NoClassDefFoundError: scala/Product$class'. It seems that the jar produced by the mvn command does not contain scala library. Please see the attached file for detail.