Open mturilli opened 10 years ago
The reason is that we happily pull for the workload state, which is 'DISOATCHED' (all CUs are PENDING) -- but never bother to check the overlay state. This is a performance issue, and an issue of programming paradigms. Ideally, we would love to have notifications on pilot state changes, so that troy gets informed when the overlay goes MIA. That will come in saga pilot eventually (is getting closer). W/o notifications, we could alternate between polling pilot state and workload state -- but that makes the examples more complex, and adds up latencies.
Either way, I agree that this needs adressing...
Hi Andre, Many thanks for the insightful details. Do you have a timeline for the implementation of notifications in sagapilot? In case of a long timeline, we may want to see whether we need to evaluate in detail the the latency+complexity overhead of the alternative, pull-based approach.
On Sun, Feb 16, 2014 at 4:17 PM, Andre Merzky notifications@github.comwrote:
The reason is that we happily pull for the workload state, which is 'DISOATCHED' (all CUs are PENDING) -- but never bother to check the overlay state. This is a performance issue, and an issue of programming paradigms. Ideally, we would love to have notifications on pilot state changes, so that troy gets informed when the overlay goes MIA. That will come in saga pilot eventually (is getting closer). W/o notifications, we could alternate between polling pilot state and workload state -- but that makes the examples more complex, and adds up latencies.
Either way, I agree that this needs adressing...
Reply to this email directly or view it on GitHubhttps://github.com/saga-project/troy/issues/49#issuecomment-35212875 .
Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University
Thinking about it, we might need the pulling approach anyways, for bigjob. Will think of something - but probably not over the next week or so. In our call in ~10 days, can we go over the open tickets and prioritize (e.g. sort them into milestones)?
OK, thank you. Re milestones: sure, this is what I am doing and I would be more than happy to do this altogether.
On Mon, Feb 17, 2014 at 2:29 AM, Andre Merzky notifications@github.comwrote:
Thinking about it, we might need the pulling approach anyways, for bigjob. Will think of something - but probably not over the next week or so. In our call in ~10 days, can we go over the open tickets and prioritize (e.g. sort them into milestones)?
Reply to this email directly or view it on GitHubhttps://github.com/saga-project/troy/issues/49#issuecomment-35233904 .
Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University
Currently, TROY does not catch/act upon errors caught by the pilot layer. Examples:
In both cases, TROY reports pilots in state dispatched and happily runs forever.