uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

Rss shuffle data size is much larger than external shuffle service #90

Closed YutingWang98 closed 1 year ago

YutingWang98 commented 1 year ago

Hi team,

I noticed that the RSS genereates almost 2 times more shuffle data than external shuffle service. For example, a job stage with rss has a shuffle write size of 720 TB, but external shuffle service only has 370 TB, and they are using the same inputs and codes. We also tried with some other job, and got the similar result.

We first suspect this may cause by no compression, but I took a look at the RssShuffleManager, and seems like there is compression/decompression on client side using LZ4(https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/RssShuffleWriter.scala#L205, https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/rss/BlockDownloaderPartitionRecordIterator.scala#L167).

Any ideas about possible causes of such a big margin of shuffle data? Thanks.

hiboyang commented 1 year ago

Hi @YutingWang98, this is interesting finding! For the size 720 TB/370 TB, is this disk file size, or the stats shown in Spark UI?

YutingWang98 commented 1 year ago

Hi @hiboyang, thank you for the reply! It is the stats from spark UI stage tab ('Shuffle Write Size / Records' value). Also, I just did more tests on a same job and here are my findings:

Remote shuffle service

External shuffle service

So, I think switching to zstd might be helpful.

hiboyang commented 1 year ago

Nice to know zstd has better compression ratio. I created a PR to support zstd in RSS: https://github.com/uber/RemoteShuffleService/pull/91/files

YutingWang98 commented 1 year ago

Thank you! WIll test our job with this new change. Also I think in spark compression, the zstd compression level 'spark.io.compression.zstd.level' is set to 1 as default. But I saw you are using level 3 as default. Is there any specific reason for it?

hiboyang commented 1 year ago

No specific reason :) I changed it to level 1 in the PR, but forgot to reply the comment here.

YutingWang98 commented 1 year ago

I saw your changes in the pull request! Thanks so much.