uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
323 stars 101 forks source link

Failed to get all RSS servers #60

Closed Lobo2008 closed 2 years ago

Lobo2008 commented 2 years ago

Hi, I would like to test JavaWordCount using Uber Remote Shuffle Service, I followed https://github.com/uber/RemoteShuffleService README to:

  1. BUILD RSS client and server then got xx-client.jar and xx-server.jar.
  2. RUN java -Dxxx -cp com.uber.rss.StreamServer -port 12222 -serviceRegistry standalone -dataCenter dc1 on one of my client node where i used to submit spark application
  3. set my spark-default.cnf referring the README
  4. ./spark-submit xxxx to submit an application

then Exception returned:Failed to get all RSS servers

My puzzles are:

  1. Is your RSS compatible with YARN ? since I only saw -serviceRegistry supports Standalone and zookeeper
  2. If not, what should i still need to do to run on YARN with your RSS

enviroment: all of our spark applications are running on YARN

command to submit: $SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.JavaWordCount \ --conf spark.speculation=false \ --jars /remote-shuffle-service-0.0.9-client.jar \ --conf spark.driver.extraClassPath=remote-shuffle-service-0.0.9-client.jar \ --conf spark.executor.extraClassPath=remote-shuffle-service-0.0.9-client.jar \ --conf spark.shuffle.service.enabled=false \ --conf spark.dynamicAllocation.enabled=false \ $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.5.jar \ hdfs:///datas.txt

part of Exceptions: Exception in thread "main" com.uber.rss.exceptions.RssException: Failed to get all RSS servers at com.uber.rss.metadata.ServiceRegistryUtils.getReachableServers(ServiceRegistryUtils.java:82) at org.apache.spark.shuffle.RssShuffleManager$$anonfun$8.apply(RssShuffleManager.scala:396) at org.apache.spark.shuffle.RssShuffleManager$$anonfun$8.apply(RssShuffleManager.scala:395) at org.apache.spark.shuffle.RssServiceRegistry$.executeWithServiceRegistry(RssServiceRegistry.scala:80) at org.apache.spark.shuffle.RssShuffleManager.getRssServers(RssShuffleManager.scala:395) at org.apache.spark.shuffle.RssShuffleManager.registerShuffle(RssShuffleManager.scala:109) at org.apache.spark.ShuffleDependency.(Dependency.scala:93) at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87) at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:256) at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:252) at scala.Option.getOrElse(Option.scala:121)

hiboyang commented 2 years ago

Did you add spark conf like following to tell RSS client in your spark application to connect to your RSS server? like following:

spark.shuffle.rss.serviceRegistry.type=standalone
spark.shuffle.rss.serviceRegistry.server=server1:12222
spark.shuffle.rss.dataCenter=dc1
Lobo2008 commented 2 years ago

Did you add spark conf like following to tell RSS client in your spark application to connect to your RSS server? like following:

spark.shuffle.rss.serviceRegistry.type=standalone
spark.shuffle.rss.serviceRegistry.server=server1:12222
spark.shuffle.rss.dataCenter=dc1

Thank you!I forgot to change the server1 to my IP.

About the two puzzles I have mentioned above,(If any one needs) I found that the STANDALONE means RSS's standalone not spark's standalone.

mayurdb commented 2 years ago

Closing this. Feel free to create a new issue if needed