mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
172 stars 51 forks source link

waitForJobs - some more granularity ? #146

Closed michaelmayer2 closed 7 years ago

michaelmayer2 commented 7 years ago

I am really happy with batchtools and it does exactly what we need. Been using BatchJobs before and now moved to batchtools.

A small issue became apparen however lately to me: I can see that the waitForJobs routine was already refactored and improved a lot I am wondering if I am doing something wrong or missing something:

My code is running a couple of tasks and they are starting to get scheduled and start running fine. Some of them however fail because they are killed by the scheduler (Elapsed Time Exceeded). There is nothing wrong with that for me.

If WaitForJobs however detects such a situation (case 3 in waitForJobs repeat routine), it immediately fails and exits so that the remainder of the code continues. There it hits the registry cleanup which is bound to fails since there is still jobs running.

I managed to hack a second repeat loop into case 3 that waits until there is no jobs in the queue any more and this seems to do the trick for us.

Is there any other way to make sure all jobs are finished in waitForJobs irrespective if they are killed or finishing normally ?`

Many thanks,

Michael.

mllg commented 7 years ago

Michael,

I'm currently at a workshop but will look into this issue next week.

Best, Michel

mllg commented 7 years ago

I've introduced a new argument stop.on.expire in #147. Does this work for you?

mllg commented 7 years ago

This is now merged. Report back if you encounter any problems.

michaelmayer2 commented 7 years ago

Works for me ! Thanks for fixing this.

Michael.