uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

hit exception writing heading bytes XXXXX #76

Closed Lobo2008 closed 1 year ago

Lobo2008 commented 2 years ago

Running a 1TB~3TB Spark Application,it always failed after running several hours. blow is the Exception

Stage 0:>                                                       (0 + 0) / 1000]22/08/06 13:07:28 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: 
Aborting TaskSet 0.0 because task 886 (partition 886)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 107.1 in stage 0.0 (TID 1219, 10.203.23.201, executor 463): com.uber.rss.exceptions.RssNetworkException: writeRowGroup: hit exception writing heading bytes 13586, DataBlockSyncWriteClient 82 [/XXXXXX.201:47560 -> MY_RSS_HOST/10.XXXXX.230:12202 (XXXXXXX)], SocketException (Broken pipe)
    at com.uber.rss.clients.DataBlockSyncWriteClient.writeData(DataBlockSyncWriteClient.java:133)
    at com.uber.rss.clients.PlainShuffleDataSyncWriteClient.writeDataBlock(PlainShuffleDataSyncWriteClient.java:40)
    at com.uber.rss.clients.ServerIdAwareSyncWriteClient.writeDataBlock(ServerIdAwareSyncWriteClient.java:73)
    at com.uber.rss.clients.ReplicatedWriteClient.lambda$writeDataBlock$2(ReplicatedWriteClient.java:82)
    at com.uber.rss.clients.ReplicatedWriteClient.runAllActiveClients(ReplicatedWriteClient.java:154)
    at com.uber.rss.clients.ReplicatedWriteClient.writeDataBlock(ReplicatedWriteClient.java:78)
    at com.uber.rss.clients.MultiServerSyncWriteClient.writeDataBlock(MultiServerSyncWriteClient.java:124)
    at com.uber.rss.clients.LazyWriteClient.writeDataBlock(LazyWriteClient.java:99)
    at org.apache.spark.shuffle.RssShuffleWriter$$anonfun$sendDataBlocks$1.apply(RssShuffleWriter.scala:166)
    at org.apache.spark.shuffle.RssShuffleWriter$$anonfun$sendDataBlocks$1.apply(RssShuffleWriter.scala:161)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.shuffle.RssShuffleWriter.sendDataBlocks(RssShuffleWriter.scala:161)
    at org.apache.spark.shuffle.RssShuffleWriter.write(RssShuffleWriter.scala:108)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:415)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1403)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:421)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:141)
    at com.uber.rss.clients.DataBlockSyncWriteClient.writeData(DataBlockSyncWriteClient.java:131)
hiboyang commented 2 years ago

There is max bytes limit in shuffle server to protect the server, see https://github.com/uber/RemoteShuffleService/blob/master/src/main/java/com/uber/rss/execution/ShuffleExecutor.java#L81

You could change that value if your shuffle data exceeds that limit.

Lobo2008 commented 2 years ago

Thanks,I'll try it

mayurdb commented 2 years ago

Hi, @Lobo2008 Let us know as Bo mentioned, if the max app shuffle data size per server is the issue or not. You should see a RssTooMuchDataException in the stack trace.

If that's not the issue, please check

Lobo2008 commented 2 years ago

Hi @mayurdb

image

cpd85 commented 2 years ago

I think that DEFAULT_APP_MAX_WRITE_BYTES is actually per server, so if you write 3TB of data but evenly distribute it to multiple servers you would not run into the issue

Lobo2008 commented 2 years ago

I think that DEFAULT_APP_MAX_WRITE_BYTES is actually per server, so if you write 3TB of data but evenly distribute it to multiple servers you would not run into the issue

I guess so.

Lobo2008 commented 2 years ago

Hi @mayurdb

  • It's the latest version. I cloned and compiled the master branch in April 2022.
  • no RssTooMuchDataException ever happened, just RssNetworkException
  • I have re-run the app without change the size as Bo mentioned ( i'll try it later) and so far it runs well. I'll post the detail if the application finished or failed
  • Wonder if the DEFAULT_APP_MAX_WRITE_BYTES=3TB is one stage shuffle size limitation or the accumulative size of all the shuffle write(?) stages for one application ? Stage-6 has 3TB but still works fine.

image

Finished successfully. But I found that the exception hit exception writing heading bytes is caused by one or some of RSS running out of disk storage space.

hiboyang commented 2 years ago

Cool, glad you found the cause, and thanks for the update!