radical-cybertools / radical.owms

Tiered Resource OverlaY
Other
0 stars 1 forks source link

Critical: Fix data staging in/out for strategy_basic_early_binding #34

Closed mturilli closed 10 years ago

mturilli commented 10 years ago

This is critical for the demo. We need to run the gromacs workload with both early and late binding strategies.

andre-merzky commented 10 years ago

Troy cannot sensibly provide staging for the first partition in early binding -- at that point, the target resource for the pilot is not know, thus the target resource for the CU is not known, thus troy does not know where to stage to.

One can hack around it, very ugly, and wouldn't work in all cases -- but I'd rather not mess up troy in that respect until we have a clearer understanding what 'data staging before resource assignment' actually means.

Makes sense?

A.

PS.: we can run the demo in both binding cases (and I did): we first run the late binding, leave the data lying around, and then the early binding with the same target machines. That is cheating of course...

mturilli commented 10 years ago

I suspect there is a misunderstanding about the relationship between of early/late binding and data staging. We are not speaking about early data staging.

In our current implementation, early and late binding refer to the binding of CUs to pilots. As far as I understand, Early binding is the binding of CUs to pilots that are not in state 'Running', while in a late binding CUs are bound to pilots that are in state 'Running'.

From a data point of view, the requirement is that when the CUs are executed by the agent of the pilot(s) to which they are bound, the kernel and, in case, its input files(s) should be physically located on the targeted resource - i.e. both should be staged and accessible to each other.

In a early binding scenario, we do not need to transfer the data until the kernel is also transferred, namely until the pilot has been scheduled and then executed, and therefore the location of the targeted resource is known.

On Fri, Jan 31, 2014 at 5:28 AM, Andre Merzky notifications@github.comwrote:

Troy cannot sensibly provide staging for the first partition in early binding -- at that point, the target resource for the pilot is not know, thus the target resource for the CU is not known, thus troy does not know where to stage to.

One can hack around it, very ugly, and wouldn't work in all cases -- but I'd rather not mess up troy in that respect until we have a clearer understanding what 'data staging before resource assignment' actually means.

Makes sense?

A.

PS.: we can run the demo in both binding cases (and I did): we first run the late binding, leave the data lying around, and then the early binding with the same target machines. That is cheating of course...

Reply to this email directly or view it on GitHubhttps://github.com/saga-project/troy/issues/34#issuecomment-33775360 .

Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University

andre-merzky commented 10 years ago

When a pilot 'pilot has been scheduled and then executed', it can immediately start to run CUs, as those are already bound to the pilot in the early binding case. So there is simply no time to transfer the data for the CUs...

One could, of course, bind CUs to pilots, and then schedule the pilots -- but not execute them. In that case all scheduling is done, and I know the target resources, and could then stage data. So:

  create workload
  create overlay
  translate overlay
  translate workload
  schedule workload
  schedule overlay
  stage data
  provision overlay
  provision workload

but there is not much earliness in this early binding trace anymore, isn't there? But this can be done - so let me know if this is what we want, please.

mturilli commented 10 years ago

Looks almost good to me. Data staging needs to be done before workload provisioning. So I would suggest:

create workload create overlay translate overlay translate workload schedule workload schedule overlay provision overlay stage data provision workload

Why do you see a lack of 'earliness' in this trace? The late equivalent is:

create workload create overlay translate overlay schedule overlay provision overlay translate workload schedule workload stage data provision workload

As per my previous comment, I see the earliness within the relationship between CUs and overlay (pilot(s)), not between data and overlay.

PS A stronger version of the late trace would be:

create overlay translate overlay schedule overlay provision overlay create workload translate workload schedule workload stage data provision workload

But that is for another ticket and maybe for TROY 2 :)

On Fri, Jan 31, 2014 at 8:45 AM, Andre Merzky notifications@github.comwrote:

When a pilot 'pilot has been scheduled and then executed', it can immediately start to run CUs, as those are already bound to the pilot in the early binding case. So there is simply no time to transfer the data for the CUs...

One could, of course, bind CUs to pilots, and then schedule the pilots -- but not execute them. In that case all scheduling is done, and I know the target resources, and could then stage data. So:

create workload create overlay translate overlay translate workload schedule workload schedule overlay stage data provision overlay provision workload

but there is not much earliness in this early binding trace anymore, isn't there? But this can be done - so let me know if this is what we want, please.

Reply to this email directly or view it on GitHubhttps://github.com/saga-project/troy/issues/34#issuecomment-33794756 .

Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University

andre-merzky commented 10 years ago

This is now implemented.