uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

spark 3.1/3.2? #62

Open cpd85 opened 2 years ago

cpd85 commented 2 years ago

hi all, I saw there is a spark30 branch for spark 3.0.x supported in the readme. there also seems to be a spark31 branch but wondering is there any plans to support spark 3.2 or could it work out of the box with spark31 branch?

hiboyang commented 2 years ago

Yeah, agree it is confusing here. Spark 3.1 and 3.2 have slight difference in shuffle APIs, thus we need to change Remote Shuffle Service accordingly. I used to work on Remote Shuffle Service when I was in Uber. Now I left Uber, and do not have write access to this repo anymore.

What environment are you interested to run Remote Shuffle Service, e.g. YARN, or Kubernetes? If Kubernetes, I have some other repo to make Remote Shuffle Service compatible with Kubernetes for Spark 3.1 and 3.2.

cpd85 commented 2 years ago

@hiboyang thanks for the response -- I really appreciate it! I think for now, would love to be able to run on YARN. Kubernetes I would love to explore as well. If you point me towards some repo/changes you made for compatibility, maybe I could extend it to run on YARN as well?

hiboyang commented 2 years ago

I see. In that case, you could change 2.4.3</spark.version> in pom.xml to Spark 3 version. You will get some compile error, and you could start from there.

I tried to get some time to provide example, but really busy these days :(

roligupt commented 2 years ago

@hiboyang I am looking to deploy remote shuffle service in my kubernetes cluster, preferably for spark 3.1.1. What's your recommendation?

avs-alatau commented 2 years ago

Hi!

Support for spark 3.2 is very interesting is also required there java 11 I tried to change some parameters for spark 3.2, for example,

<java.version>11</java.version>
<hadoop.version>3.2.2</hadoop.version>
<spark.version>3.2.0</spark.version>
<scala.version>2.12.15</scala.version>

but I get an error

[ERROR] /home/alatau/ssk/3.2/src/main/scala/org/apache/spark/shuffle/rss/RssStressTool.scala:144: not enough arguments for method registerShuffle: (shuffleId: Int, numMaps: Int, numReduces: Int)Unit.
Unspecified value parameter numReduces.
[ERROR]     mapOutputTrackerMaster.registerShuffle(appShuffleId.getShuffleId, numMaps)
[ERROR]                                           ^
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
cpd85 commented 2 years ago

@avs-alatau as @hiboyang mentioned, there's a difference in APIs, so its not enough to just change the spark.version -- you'll need to implement the new APIs as well. Bo's done the work here but its only running on k8s at the moment : https://github.com/hiboyang/RemoteShuffleService/tree/k8s-spark-3.2

avs-alatau commented 2 years ago

@cpd85 thanks for the link to k8s but at the moment it is possible to configure only for yarn

cpd85 commented 2 years ago

@avs-alatau could you help me understand what you're asking for? The code doesn't exist or isn't open source for yarn. At the moment I'm working on fighting through these compilation issues to see if I can get a 3.2 client to communicate with a 2.4 server. I'll be happy to share the code if I end up getting it working

avs-alatau commented 2 years ago

@cpd85 Thanks for the help. I have a hadoop cluster with spark 3.2 Now spark jobs are working through YARN and there are some problems with this because of which I am looking for an external Shuffle Service I managed to set up spark jobs on a test cluster for the spark 3.0 version, but due to the fact that spark 3.2 is installed in the industrial cluster, I am looking for an external Shuffle Service that will provide this opportunity If you manage to build an RSS version for spark 3.2, I will be grateful

cpd85 commented 2 years ago

@avs-alatau haven't done too much testing but I got this to work with a spark3.2 page rank example app

https://github.com/cpd85/RemoteShuffleService/tree/spark32