oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
293 stars 66 forks source link

incompatible with spark 3.4.3 and 3.5.0 #409

Open wuxiaocheng0506 opened 3 weeks ago

wuxiaocheng0506 commented 3 weeks ago

I use the lastest nightly version 1.7.0b20240501.dev0.

It works init spark and read data to spark dataframe, but when run ray.data.from_spark(df) :

it blocked when using spark 3.4.3. and when using spark 3.5.0, raise excepction:

Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.util.ArrowUtils$.toArrowSchema(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)Lorg/apache/arrow/vector/types/pojo/Schema; at org.apache.spark.sql.raydp.ObjectStoreWriter.$anonfun$save$1(ObjectStoreWriter.scala:108) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ... 1 more

pang-wu commented 1 week ago

This PR https://github.com/oap-project/raydp/pull/411 should fix your problem

pang-wu commented 1 week ago

@wuxiaocheng0506 mind try the nightly build again?

wuxiaocheng0506 commented 1 week ago

Test passed with pyspark 3.5.0 and 3.4.3, thanks. @pang-wu