uci-cbcl / genomix

Parallel genome assembly using Hyracks
3 stars 2 forks source link

rerun with lower hadoop max tasks #19

Closed Nan-Zhang closed 11 years ago

Nan-Zhang commented 11 years ago
jakebiesinger commented 11 years ago

@JavierJia you mentioned today that this could be done by changing a setting in the JobConf. I've changed the title to include the work to make this happen.

JavierJia commented 11 years ago

I think the only way is to change the Hadoop mapred-site.xml to limit the maximum number of mapper. It can be set. But we should set it before start the Hadoop cluster.

jakebiesinger commented 11 years ago

Hmmm... so we can do it for local MR jobs that use the miniMR cluster, but can't control the HDFS cluster that's already running. Bummer.

Jake Biesinger Graduate Student Xie Lab, UC Irvine

On Tue, Oct 22, 2013 at 3:43 PM, Jianfeng Jia notifications@github.comwrote:

I think the only way is to change the Hadoop mapred-site.xml to limit the maximum number of mapper. It can be set. But we should set it beforestart the Hadoop cluster.

— Reply to this email directly or view it on GitHubhttps://github.com/uci-cbcl/genomix/issues/19#issuecomment-26860964 .

anbangx commented 11 years ago

Look into it more... @sigmod says that it should be 2x faster :smile:

jakebiesinger commented 11 years ago

Closing this issue with a reference to #10 and #16 for future sources of optimization.