ooyala / spark-jobserver

REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver/spark-jobserver. This fork now serves as a semi-private repo for Ooyala.
Other
344 stars 135 forks source link

Add support for classpaths #34

Open wienczny opened 10 years ago

wienczny commented 10 years ago

It would be great if you could define a classpath for your application consisting of serveral jars. This would allow to reuse library jars like mllib or graphx.

This could be implemented in one of two ways:

  1. You could upload more than one jar to /jars/appname/1 .. n
  2. When creating a context you can define a classpath by referencing uploaded jars.

This could help mitigate https://github.com/ooyala/spark-jobserver/issues/5

velvia commented 10 years ago

I posted this in the other issue thread:

Hey guys,

There is a better way to solve the large file problem. 1) Use this setting: dependent-jar-uris = ["local://opt/foo/my-foo-lib.jar"](note: this has to be under each context's settings, see "context settings" in README) What this setting means is, in addition to your job jar, add the above jar to my context classpath. local means that this jar must be present on every node (you could make it http, for example, but then you would have to wait for this jar to download at context start)

2) Add the large dependent jar to the classpath when starting job server as well as to spark-conf.sh on the different nodes.

Obviously, 2) is a much bigger pain.

On Tue, Jun 17, 2014 at 1:32 AM, Stephan Wienczny notifications@github.com wrote:

It would be great if you could define a classpath for your application consisting of serveral jars. This would allow to reuse library jars like mllib or graphx.

This could be implemented in one of two ways:

  1. You could upload more than one jar to /jars/appname/1 .. n
  2. When creating a context you can define a classpath by referencing uploaded jars.

This could help mitigate #5 https://github.com/ooyala/spark-jobserver/issues/5

— Reply to this email directly or view it on GitHub https://github.com/ooyala/spark-jobserver/issues/34.

The fruit of silence is prayer; the fruit of prayer is faith; the fruit of faith is love; the fruit of love is service; the fruit of service is peace. -- Mother Teresa

wienczny commented 10 years ago

What I'd like to suggest is a combination of 1 and 2. It would be great if the job-server could handle the distribution of dependency jars. The issue with 1 is that you have to distribute the jars to the workers. If you don't require a shared fs between those nodes this is a job that the jobserver should do like it already does for the app jar. This would simplify the deployment of applications withs lots of rarely changing dependencies.