Open adityav opened 6 years ago
But if you want to use --files
, you have to somehow prepare your own log4j.properties
file on EMR's instance where spark-submit
is called, right? How would you achieve that?
BTW, won't it be easier to launch another cluster for testing and terminate it afterward?
Maybe I can try to make the arguments to spark-submit
into a SBT setting for people who want to customize it (like changing --deploy-mode
, adding --jars
or --files
), but this may not be an overall solution for your problem?
Any further idea are welcomed.
by default, EMR logs everything in INFO mode, generating massive amounts of logs being generated. One way to counter this is to specify log4j-properties in the EMR config. However, this file can only be specified once, during cluster launch. Which makes it a pain when testing / debugging jobs.
Another way is to specify a log4j.properties files like this.
spark-submit --files path/to/log4j.properties
.Is it possible to add this to the plugin? Would be greatly appreciated. More info in this stackoverflow thread: https://stackoverflow.com/a/42523811