pishen / sbt-lighter

SBT plugin for Apache Spark on AWS EMR
Apache License 2.0
57 stars 15 forks source link

Add ability to specify a custom log4j.properties files using spark-submit #33

Open adityav opened 6 years ago

adityav commented 6 years ago

by default, EMR logs everything in INFO mode, generating massive amounts of logs being generated. One way to counter this is to specify log4j-properties in the EMR config. However, this file can only be specified once, during cluster launch. Which makes it a pain when testing / debugging jobs.

Another way is to specify a log4j.properties files like this. spark-submit --files path/to/log4j.properties.

Is it possible to add this to the plugin? Would be greatly appreciated. More info in this stackoverflow thread: https://stackoverflow.com/a/42523811

pishen commented 6 years ago

But if you want to use --files, you have to somehow prepare your own log4j.properties file on EMR's instance where spark-submit is called, right? How would you achieve that?

BTW, won't it be easier to launch another cluster for testing and terminate it afterward?

Maybe I can try to make the arguments to spark-submit into a SBT setting for people who want to customize it (like changing --deploy-mode, adding --jars or --files), but this may not be an overall solution for your problem?

Any further idea are welcomed.