uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

How to evaluate rss cluster size? #92

Open Lobo2008 opened 1 year ago

Lobo2008 commented 1 year ago

Hi, we have about 20,000 daily spark applications. All these apps produce 100TB shuffle writes/reads data respectively. A peak app produces 6TB, but most produce less than 100GB.

We ran some online spark apps on test rss cluster and found that rss node consume obviously more memory than cpu/disk/etc which even cause some rss/node break down .

Now we would like to run all apps on rss, any suggestions on rss cluster size and machine selection with replicas=2 ? eg.memory, cpu, node num,disk,memory-cpu ratio, etc. Any suggesion will help.

One more: mappers send data to StreamServer,StreamServer stores in memory,then flush to disk, not many calculations, so rss consumes memory a lot more than cpu, is that right?

hiboyang commented 1 year ago

Hi @Lobo2008, how much memory usage do you see in RSS server? RSS server gets shuffle data in small blocks from Spark mapper, and write the block to disk. It will not cache large amount of data in memory. Thus curious how much memory you see RSS server uses.

Normally the bottleneck of RSS server will be disk io and network bandwidth since Spark applications write/read a lot of data from there. You could start with 10 to 50 spark executors mapping to one RSS server. Then observe disk/network metrics on RSS server, and adjust accordingly.