Default behaviour of SGE inhibits the load balancer from shutting down nodes

It seems that the default behaviour of SGE is to use "load_formula = np_load_avg" (see qconf -ssconf) which will balance jobs across nodes.

For example:

My cluster currently has three nodes up and the queue is currently empty
Three new jobs come in -- these will most likely be spread across each of the three nodes
Since all three nodes have processes on them the load balancer will not be able to shut down any of the nodes even though the cluster is under-utilised

I'd suggest modifying the SGE setup to use the "fill up host" configuration according to: http://wiki.gridengine.info/wiki/index.php/StephansBlog

Even better would be to configure SGE to send jobs to the most recently booted node first so that we may shut down older nodes first (hopefully before their hour is up). I'm not yet sure if this is possible.

jtriley / StarCluster

Default behaviour of SGE inhibits the load balancer from shutting down nodes #158