rapidsai / spark-examples

[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples
https://github.com/NVIDIA/spark-xgboost-examples
Apache License 2.0
70 stars 40 forks source link

The scala project under examples would cause NoClassDefFoundError #70

Open cfangplus opened 4 years ago

cfangplus commented 4 years ago

I followed the steps that is illustrated in the page of https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/building-sample-apps/scala.md and built the scala project. Then I used spark-submit to submit the application to the cluster and I got an exception called 'java.lang.NoClassDefFoundError: scala/Product$class'. It seems that the jar produced by the mvn command does not contain scala library. Please see the attached file for detail. NoClassDefFoundError

cfangplus commented 4 years ago

I note the assembly-no-scala.xml file under the project and find the scala library is excluded, but why?

chuanlihao commented 4 years ago

Hi cfangplus,

What Spark version were you using? The examples here are not compatible with the latest Spark-3.0.0-preview yet. If you run into the same issue on Spark 2.x, please provide your environment details and more logs so I could try reproducing this issue.

As for the Scala library, I think it's already provided by Spark runtime.

cfangplus commented 4 years ago

yea, I noticed that. I used spark-submit to submit the application to a spark3.0 cluster and the scala library conflicted, do you know how to fix or how to enable this scala project compatible wih spark3.0?

chuanlihao commented 4 years ago

Spark 3.0 support is currently in development. My suggestion is using Spark 2.x before the 3.0 compatible release.

There is no easy way to fix the compatible issue. Both the xgboost project and the examples project must be updated and re-built. It's complex.

cfangplus commented 4 years ago

I know this project is developed since June 2019 and at that time the cuda version is 10.1. Now I have a gpu environment with cuda10.2, how could I get the cudf-0.9.2-cuda10-2.jar and libxgboost4j.so with cuda10.2 support?

chuanlihao commented 4 years ago

The team is planning to support CUDA 10.2.

As for now, you could install both CUDA 10.1 & 10.2 on your server and run these examples with CUDA 10.1: https://stackoverflow.com/questions/41330798/install-multiple-versions-of-cuda-and-cudnn

cfangplus commented 4 years ago

That's great. Now I have another question. Why does this spark-example project be proposed? As we konw that, NV Rapids + Dask could provide distributed data processing, machine learning and graph computing, so Apache Spark seems does not been needed, right?

chuanlihao commented 4 years ago

I think Joshua answered your question here: https://github.com/rapidsai/cudf/issues/3643

Also as specified by the README.md, this repo provides docs and example applications that demonstrate the RAPIDS.ai GPU-accelerated XGBoost-Spark project.

cfangplus commented 4 years ago

@chuanlihao Thank you for your reply. Recently I runed this program with cuda10.1 and the result is good. As we know, the kernel module of xgboost is writen by c/c++ and provide shared library to python, JVM, R and other languages API. So does cuML has the capacity that could be used by Spark via JNI ? Do you have some similar idea or plan?

anfeng commented 4 years ago

Currently cuML only have Python binding. Technically we could apply a similar approach as in XGboost with cuML. Please let us know your use cases for cuML on Spark.

cfangplus commented 4 years ago

@anfeng Thanks,I mean we want to accelerate our Spark ML/Graph applications with gpu.