Open echeran opened 11 years ago
I've noticed (perhaps?) related issues in pure Cascading. Configuration properties supplied to the FlowConnector don't always get passed into the JobConf, the behaviour seems inconsistent and unpredictable. Would be good to have visibility and explicit guaranteed control over the JobConf.
(as from the mailing list: https://groups.google.com/forum/#!topic/cascalog-user/Rq_O33VsDyc )
I've come across similar issues of the options for child JVMs specified in with-job-conf not "sticking". I experienced GC issues in a reducer of one of my Cascalog jobs for the first time last week. I found the with-job-conf macro and wrapped the query execution form with it, to no avail:
The relevant parts of my project.clj
But from the logging output from the reducer in question, regardless of what I specified in with-job-conf, I always saw this:
2013-07-12 17:25:55,216 INFO cascading.flow.hadoop.FlowMapper: child jvm opts: -Xmx1073741824
Further details:
I saw Robin's workaround, which seems to just modify the site-hadoop.xml. It would be great if the with-job-conf settings "stuck" so as not to have to tweak site settings for per-job needs (especially since I don't manage the Hadoop cluster).