oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
293 stars 66 forks source link

install raydp==1.6.0 with pip rasied the "More than one SparkShimProvider found" error #402

Open evilpapa opened 3 months ago

evilpapa commented 3 months ago

I followed the instructions in the readme to install raydp==1.6.0 using pip. When running the example from the project, I encountered the following error in the ray panel when viewing the raydp-worker logs:

我按照 readme 中的操作,使用 pip 安装 raydp。 当运行项目中的示例中,从 ray 的 面板中查看 raydp-worker 的日志发现以下错误:

2024-03-05 16:25:24,603 ERROR DefaultRayRuntimeFactory [Thread-4]: Uncaught worker exception in thread Thread[Thread-4,5,main]
java.lang.IllegalStateException: More than one SparkShimProvider found: List(com.intel.raydp.shims.spark322.SparkShimProvider@3fd134bc, com.intel.raydp.shims.spark321.SparkShimProvider@41751eaa)
    at com.intel.raydp.shims.SparkShimLoader$.loadSparkShimProvider(SparkShimLoader.scala:56) ~[raydp-shims-common-1.6.0-SNAPSHOT.jar:?]
    at com.intel.raydp.shims.SparkShimLoader$.getSparkShimProvider(SparkShimLoader.scala:75) ~[raydp-shims-common-1.6.0-SNAPSHOT.jar:?]
    at com.intel.raydp.shims.SparkShimLoader$.getSparkShims(SparkShimLoader.scala:33) ~[raydp-shims-common-1.6.0-SNAPSHOT.jar:?]
    at org.apache.spark.executor.RayDPExecutor.$anonfun$startUp$3(RayDPExecutor.scala:119) ~[raydp-1.6.0-SNAPSHOT.jar:?]
    at org.apache.spark.executor.RayDPExecutor.$anonfun$serveAsExecutor$1(RayDPExecutor.scala:246) ~[raydp-1.6.0-SNAPSHOT.jar:?]
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) ~[spark-core_2.12-3.1.3.jar:3.1.3]
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) ~[spark-core_2.12-3.1.3.jar:3.1.3]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_392]
    at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_392]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) ~[hadoop-common-2.7.4.jar:?]
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61) ~[spark-core_2.12-3.1.3.jar:3.1.3]
    at org.apache.spark.executor.RayDPExecutor.org$apache$spark$executor$RayDPExecutor$$serveAsExecutor(RayDPExecutor.scala:200) ~[raydp-1.6.0-SNAPSHOT.jar:?]
    at org.apache.spark.executor.RayDPExecutor$$anon$1.run(RayDPExecutor.scala:127) ~[raydp-1.6.0-SNAPSHOT.jar:?]

I noticed that there are multiple raydp-shims packages in the site-packages/raydp/jars directory:

查看 site-packages/raydp/jars 发现包含多个 raydp-shims 包

raydp-1.6.0-SNAPSHOT.jar
raydp-1.6.0.jar
raydp-agent-1.6.0-SNAPSHOT.jar
raydp-agent-1.6.0.jar
raydp-shims-common-1.6.0-SNAPSHOT.jar
raydp-shims-common-1.6.0.jar
raydp-shims-spark321-1.6.0-SNAPSHOT.jar
raydp-shims-spark322-1.6.0-SNAPSHOT.jar
raydp-shims-spark322-1.6.0.jar
raydp-shims-spark330-1.6.0-SNAPSHOT.jar
raydp-shims-spark330-1.6.0.jar
raydp-shims-spark340-1.6.0-SNAPSHOT.jar
raydp-shims-spark340-1.6.0.jar

After deleting raydp-shims-spark321-1.6.0-SNAPSHOT.jar, it works fine.

当删除 raydp-shims-spark321-1.6.0-SNAPSHOT.jar 之后,可正常运行。

When building from source, the whl file doesn't have the same issue. Should You update the package files in pip?

使用源码构建时 whl 文件不会出现如上情况。是否需要更新 pip 中的包文件?