uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

Uber Remote Shuffle Service (RSS)

Uber Remote Shuffle Service provides the capability for Apache Spark applications to store shuffle data on remote servers. See more details on Spark community document: [SPARK-25299][DISCUSSION] Improving Spark Shuffle Reliability.

Please contact us (remoteshuffleservice@googlegroups.com) for any question or feedback.

Supported Spark Version

How to Build

Make sure JDK 8+ and maven is installed on your machine.

Build RSS Server

mvn clean package -Pserver -DskipTests

This command creates remote-shuffle-service-xxx-server.jar file for RSS server, e.g. target/remote-shuffle-service-0.0.9-server.jar.

Build RSS Client

mvn clean package -Pclient -DskipTests

This command creates remote-shuffle-service-xxx-client.jar file for RSS client, e.g. target/remote-shuffle-service-0.0.9-client.jar.

How to Run

Step 1: Run RSS Server

java -Dlog4j.configuration=log4j-rss-prod.properties -cp target/remote-shuffle-service-0.0.9-server.jar com.uber.rss.StreamServer -port 12222 -serviceRegistry standalone -dataCenter dc1

Step 2: Run Spark application with RSS Client

spark.jars=hdfs:///file/path/remote-shuffle-service-0.0.9-client.jar
spark.executor.extraClassPath=remote-shuffle-service-0.0.9-client.jar
spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager
spark.shuffle.rss.serviceRegistry.type=standalone
spark.shuffle.rss.serviceRegistry.server=server1:12222
spark.shuffle.rss.dataCenter=dc1

Run with High Availability

Remote Shuffle Service could use a Apache ZooKeeper cluster and register live service instances in ZooKeeper. Spark applications will look up ZooKeeper to find and use active Remote Shuffle Service instances.

In this configuration, ZooKeeper serves as a Service Registry for Remote Shuffle Service, and we need to add those parameters when starting RSS server and Spark application.

Step 1: Run RSS Server with ZooKeeper as service registry

java -Dlog4j.configuration=log4j-rss-prod.properties -cp target/remote-shuffle-service-0.0.9-server.jar com.uber.rss.StreamServer -port 12222 -serviceRegistry zookeeper -zooKeeperServers zkServer1:2181 -dataCenter dc1

Step 2: Run Spark application with RSS Client and ZooKeeper service registry

spark.jars=hdfs:///file/path/remote-shuffle-service-0.0.9-client.jar
spark.executor.extraClassPath=remote-shuffle-service-0.0.9-client.jar
spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager
spark.shuffle.rss.serviceRegistry.type=zookeeper
spark.shuffle.rss.serviceRegistry.zookeeper.servers=zkServer1:2181
spark.shuffle.rss.dataCenter=dc1