uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
322 stars 100 forks source link

Using remote shuffle service with Spark operator #67

Open roligupt opened 2 years ago

roligupt commented 2 years ago

@hiboyang Have you tried using remote shuffle service with spark operator? (spark on K8s operator)?

I tested it with the client jar in my 'SparkApplication' image and it works as expected.

Although I want to include the client jar in my spark operator image so every job that I am submitting to spark operator uses the client jar from spark operator and I don't have to include the client jar in every job image.

I pretty sure this can be done but probably would need the code changes in remote shuffle service?

datapunchorg commented 2 years ago

Hi @roligupt , if you use Spark on Kubernetes (spark operator), the remote shuffle service client jar file must be inside the Spark application image, because the jar file is loaded during Spark driver start time, otherwise, there will be error.

hiboyang commented 2 years ago

Oops, just found I replied using my another GitHub account, That @datapunchorg is still me.