uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
323 stars 100 forks source link

Shuffle Files Storage Is stored by default.Whether alluxio storage is supported and how to implement it. #68

Open liangrui1988 opened 2 years ago

liangrui1988 commented 2 years ago

Shuffle Files Storage Is stored by default.Whether alluxio storage is supported and how to implement it.

mayurdb commented 2 years ago

RSS currently only supports local storage. Have a look at com.uber.rss.storage.ShuffleStorage for plugging in other storage

liangrui1988 commented 2 years ago

Thank you. One more question.Spark. Shuffle. RSS. DataCenter = dc1 the dc1 is to point to?Does local storage mean disk storage?How do you specify multiple disk directories?

cpd85 commented 2 years ago

@liangrui1988 i'm considering trying to add this support to RemoteShuffleService, would you be interested for me to try and contribute it to this repo?

hiboyang commented 2 years ago

@liangrui1988 i'm considering trying to add this support to RemoteShuffleService, would you be interested for me to try and contribute it to this repo?

Yeah, you are welcome to contribute!

hiboyang commented 2 years ago

Thank you. One more question.Spark. Shuffle. RSS. DataCenter = dc1 the dc1 is to point to?Does local storage mean disk storage?How do you specify multiple disk directories?

"DataCenter = dc1" is just a tag to distinguish different remote shuffle service instance, you could just leave it with default value dc1 or any string value.

Yes, local storage means local disk storage.

It does not support multiple disk directories right now. But feel free to add a PR for that.