radical-cybertools / radical.owms

Tiered Resource OverlaY
Other
0 stars 1 forks source link

The wall time of pilots need to be calculated dynamically #20

Open mturilli opened 10 years ago

mturilli commented 10 years ago

Currently, the wall time for each job of a pilot is set to 5 hours by default. This has undesirable consequences. Consider a pilot with 256 cores. When scheduled on india.ffuturegrid.org it will result in 32 jobs of 8 cores each. Because of india's policies, up to 20 jobs out of 32 will run when nodes will be available. Using the round robin scheduler, we will have 160 jobs running and 96 waiting for the 12 remaining job to be run by india's scheduler. With a fix wall time, the 160 jobs will wait for 5 hours. Considering that the runtime of each task I run with the current workload is 1 minute, we have an overhead of 4 hours and 59 minutes. Furthermore, the current workload management will fail for tasks with runtime above 5 hours.

Smarter CU scheduling might be used to address the problem of the 4:59 overhead. Would the current load balancing scheduler plugin reschedule the pending CUs to jobs that are already running? Note that this would not address the problem of tasks with runtime greater than the wall time limit of the queue.

andre-merzky commented 10 years ago

Hi Matteo,

I am not sure I understand this part:

Consider a pilot with 256 cores. When scheduled on india.futuregrid.org it will result in 32 jobs of 8 cores each. 

A pilot with 256 cores should result in one job which spans 32 nodes of 8 cores each -- so I don't think you would see a partial pilot instantiated? What am I missing?

mturilli commented 10 years ago

I agree, see indeed ticket #25. The problem is that at the moment TROY does not create a single job with 32 nodes but 32 jobs with a single node each. In such a situation, the issue I report in this ticket becomes impairing - namely having a wall time for the pilot set not to the estimated runtime but to an arbitrary, predefined value. What if the runtime of each task is more than 5 hours? What if it is just a minute? The wall time of the job scheduled by troy needs to be defined dynamically on the base of the estimate given with the workload description and, when available, on the base of the information provided by the resource layer - length of the queue, priority, policy and so on.

andre-merzky commented 10 years ago

Sure, I agree. This is in fact in sync with a complain I have a bout Sinon -- Sinon requires a walltime to be specified -- but gives no guidance on what a sensible or even valid walltime would be for a given queue. Thus Troy picks a kind of random one (as does BJ btw).

IMHO, Sinon should use the max walltime of the given queue as default value... But, of course, eventually Troy needs more intelligence and information sources to deal sensibly with overlay planning...

mturilli commented 10 years ago

I believe the wall time should depend on the estimated runtime of the tasks of a workload. Using always the queue maxtime leads to extreme 'pessimization' of resource utilization - as already discussed in the past, 'pessimization' was indeed the word Mark used. As a consequence, I do not think we should push down towards the pilot layer the decision of wall time but, instead, upwards to the application layer, limited to a request for an estimation of task runtime. We should then deduce the rest within TROY putting together different source of information.

On Mon, Jan 27, 2014 at 8:38 AM, Andre Merzky notifications@github.comwrote:

Sure, I agree. This is in fact in sync with a complain I have a bout Sinon -- Sinon requires a walltime to be specified -- but gives no guidance on what a sensible or even valid walltime would be for a given queue. Thus Troy picks a kind of random one (as does BJ btw).

IMHO, Sinon should use the max walltime of the given queue as default value...

— Reply to this email directly or view it on GitHubhttps://github.com/saga-project/troy/issues/20#issuecomment-33367704 .

Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University

andre-merzky commented 10 years ago

Walltime is notoriously hard to predict for tasks. How long does 'mandelbrot 12 43 simple 3' need? If the user provides that information, great. If we can derive it, wonderful. But in the standard case there might be no such information. It is for those cases that I would put the control over lifetime in the hands of the user (and expect her to cancel things cleanly), or would cancel things in Troy automatically once there is no load anymore -- but I any a-priory pilot lifetime will be wrong -- either too long which is exactly the waste you talk about, and which we want to avoid, or too short, which is even worse...

My $0.02 anyways... ;)