ooyala / spark-jobserver

REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver/spark-jobserver. This fork now serves as a semi-private repo for Ooyala.
Other
343 stars 135 forks source link

Can't upload large jar file. #5

Open samuel281 opened 10 years ago

samuel281 commented 10 years ago

Hi, I'm currently using this new spark-jobserver. Almost everything just works really well but I have a small issue.

Sometimes I need to pack a jar with dependencies and it makes jar much larger. I don't want to restart spark jobserver to change its classpath every time I make a new jar that depend on newly added libraries.

But the request content length limit is too low (it say 8388608 bytes) so I'm afraid that I won't be able to upload that kind of large jars...

I tried to change spray can configuration but it didn't work. Is there any way to change this limitation?

velvia commented 10 years ago

@samuel281 Hi there,

So you tried setting spray.can.parsing.max-content-length = 16m, and restarting? And it doesn't work? I'm definitely OK with raising the limit, if you don't mind figuring out how to do it.

At Ooyala, we actually don't upload large jars. What we do is that we separate our jars into two layers, the business logic which is really small, and a "platform" module which we append to the classpath. Since the platform doesn't change very often, it is acceptable for now.

There is actually a configuration in the job server to get around this, so that you can add a "side jar" aside from the main jar, and don't have to mess with classpaths, but it doesn't work at the moment. I might fix it if I have time, I'm working on a related note.

samuel281 commented 10 years ago

thanks! I guess there was a mistypo or something.. I'll try again and report.

pmarnik commented 10 years ago

I have tested, and changing this configuration value via conf file does not work. But it works when setting it via system property.

> re-start config/local.conf --- -Dspray.can.parsing.max-content-length=16m
velvia commented 10 years ago

Have you tried changing it in src/main/resources/application.conf? Maybe there is some weird timing thing....

On Fri, Mar 28, 2014 at 1:51 PM, pmarnik notifications@github.com wrote:

I have tested, and changing this configuration value via conf file does not work. But it works when setting it via system property.

re-start config/local.conf --- -Dspray.can.parsing.max-content-length=16m

Reply to this email directly or view it on GitHubhttps://github.com/ooyala/spark-jobserver/issues/5#issuecomment-38966577 .

Evan Chan Staff Engineer ev@ooyala.com |

http://www.ooyala.com/ http://www.facebook.com/ooyalahttp://www.linkedin.com/company/ooyalahttp://www.twitter.com/ooyala

pmarnik commented 10 years ago

Yes, this was first attempt after realizing that this value isn't read from local.conf. Other values such as http port has been successfully read from local.conf.

maasg commented 10 years ago

I can add that changing the value in src/main/resources/application.conf worked fine for us.

wienczny commented 10 years ago

@maasg could you provide an example, please?

This is what I tried but it's not working

spray.can {
  server {
    # uncomment the next line for making this an HTTPS example
    # ssl-encryption = on
    idle-timeout = 20 s
    request-timeout = 15 s
    pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)
    # Needed for HTTP/1.0 requests with missing Host headers
    default-host-header = "spray.io:8765"
  }
  parsing {
    max-content-length = 200m
  }
}
maasg commented 10 years ago

I need to add that the correct setting to change is: spray.can.server.parsing.max-content-length. Setting spray.can.parsing.max-content-length has no effect on the client and server children of the config as explained in the docs:

The (default) configuration of the HTTP message parser for the server and the client. IMPORTANT: These settings (i.e. children of spray.can.parsing) can't be directly overridden in application.conf to change the parser settings for client and server altogether (see https://github.com/spray/spray/issues/346). Instead, override the concrete settings beneath spray.can.server.parsing and spray.can.client.parsing where these settings are copied to.

See: http://spray.io/documentation/1.1-M8/spray-can/configuration/

wienczny commented 10 years ago

Puh that was fast ;-p

Here is my working config ;-)

# check the reference.conf in spray-can/src/main/resources for all defined settings
spray.can.server {
  # uncomment the next line for making this an HTTPS example
  # ssl-encryption = on
  idle-timeout = 20 s
  request-timeout = 15 s
  pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)
  # Needed for HTTP/1.0 requests with missing Host headers
  default-host-header = "spray.io:8765"
  parsing.max-content-length = 200m
}
maasg commented 10 years ago

@wienczny try this:

spray.can {
  server {
      ...
      parsing {
          max-content-length = 100m
      }
}
maasg commented 10 years ago

@wienczny I was busy adding that comment when you posted yours :-) . I realized that I had to play with the settings to get it right. Just needed to remember what exactly was it ...

velvia commented 10 years ago

Hey guys,

There is a better way to solve the large file problem. 1) Use this setting: dependent-jar-uris = ["local://opt/foo/my-foo-lib.jar"](note: this has to be under each context's settings, see "context settings" in README) What this setting means is, in addition to your job jar, add the above jar to my context classpath. local means that this jar must be present on every node (you could make it http, for example, but then you would have to wait for this jar to download at context start)

2) Add the large dependent jar to the classpath when starting job server as well as to spark-conf.sh on the different nodes.

Obviously, 2) is a much bigger pain.

On Tue, Jun 17, 2014 at 2:51 AM, Gerard Maas notifications@github.com wrote:

@wienczny https://github.com/wienczny I was busy adding that comment when you posted yours :-) . I realized that I had to play with the settings to get it right. Just needed to remember what exactly was it ...

— Reply to this email directly or view it on GitHub https://github.com/ooyala/spark-jobserver/issues/5#issuecomment-46287599 .

The fruit of silence is prayer; the fruit of prayer is faith; the fruit of faith is love; the fruit of love is service; the fruit of service is peace. -- Mother Teresa

prayagupa commented 9 years ago

True that updating parsing.max-content-length in spark-jobserser's config(job-server/src/main/resources/application.conf) works fine.

   60 # check the reference.conf in spray-can/src/main/resources for all defined settings                 
   61 spray.can.server {                                                                                  
   62   # uncomment the next line for making this an HTTPS example                                        
   63   # ssl-encryption = on                                                                             
   64   idle-timeout = 60 s                                                                               
   65   request-timeout = 40 s                                                                            
   66   pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)
   67   # Needed for HTTP/1.0 requests with missing Host headers                                          
   68   default-host-header = "spray.io:8765"                                                             
~  69                                                                                                     
+  70   parsing.max-content-length = 200m                                                                 
+  71                                                                                                     
+  72 }   

As I could upload 12M successfully,

[2014-12-23 23:57:19,297] INFO  spark.jobserver.JarManager [] [akka://JobServer/user/jar-manager] - Storing jar for app smartad, 12117543 bytes