How to recude the polling frequency on PBSpro?

luuloi commented 7 years ago

I have recently run Bpipe to submit jobs to PBSpro. And getting complaint about high frequency of polling to the cluster. Are there any options to reduce it and how these options are understood and specified in the user Bpipe config file?

Any help/suggestion/advice would be nice to me!

Thank you, Loi

benmenadue commented 7 years ago

For reference, so far today (it's currently 12:20pm) Loi has made 520k requests to our PBS Pro server.

ssadedin commented 7 years ago

Wow, that is way more than I would have expected. The polling frequency is supposed to be controlled by some parameters and to use an exponential backoff (so, fast at first and then slowing off for longer jobs). See here:

https://github.com/ssadedin/bpipe/blob/master/src/main/groovy/bpipe/executor/CustomCommandExecutor.groovy#L327

In theory you should be able to set the parameters there - minimumCommandStatusPollInterval, maxCommandStatusPollInterval, commandStatusBackoffPeriod. But I am actually wondering if there's some other errors going on that are causing it to poll way more often than that.

Might be worth looking in the log to see if there are any polling errors mentioned.

benmenadue commented 7 years ago

From our end, we're seeing an SSH session on a login node that's running qstat -x -f periodically against a number of jobs. While each individual job is only being queried every 5-10 seconds by the look of it, this still adds up when there are a number of jobs. When I looked, it was doing ~4-5 queries per second, and since each of these qstat invocations generates 4 requests (connect, auth, stat_server, stat_job) to the PBS server this would equate to over a million requests per day.

If you batch the requests into a single qstat command for each poll (i.e. pass it all the job ids you're interested in), you'll do 3 requests per job instead of four (with a single stat_server at the start) - dropping the number of requests by 25%. An even better option would be to use the PBS API directly (instead of the command line tools); that way you have direct control over the commands issued and can reduce it to a single stat_job request per job.

But - do you really need to be querying the status of the jobs every 5-10 seconds?? Even every 5-10 minutes would likely be often enough...

benmenadue commented 7 years ago

Even better again, you could use PBS's ability to send e-mails on state change (e.g. start, end, abort) to become properly event-driven and avoid polling altogether.

ssadedin commented 7 years ago

Wow, it definitely should not be querying that often. If the pipeline has very high concurrency and a lot of short jobs then perhaps I could imagine it - but it should quickly back off to the point where it is polling once every 3 minutes for each job.

The frequency of polling is a tradeoff between a few different factors:

load on the login server
the possibility of missing a state change if the job state changes too fast. In particular most queuing systems expunge jobs completely after a short time (sometimes this can be less than 30 seconds)
latency in launching next job if it takes time for pipeline to notice that the last one finished

We've contemplated non-polling solutions in the past but never come up with an idea that didn't have other severe drawbacks (primarily, exhausting open file handles on the login server). I must admit, I never thought of using the email system though!

Definitely, pooling the polls so that each job is not polled separately should be a priority.

First though it would be good to find out if there's a bug causing this kind of polling. If you're up for it, clone the source for Bpipe out of github, and change this line to be log.info, and then building Bpipe. Then you can run the pipeline and see exactly what the polling activity is on Bpipe's end.

benmenadue commented 7 years ago

I'm not sure I understand your reasons for fast polling, sorry.

How does missing a state change break your pipeline? If attempting to stat a job fails because it has finished and been removed from history, then you know that it has finished (and thus can look at its output). Many sites run without job history at all, and so as soon as a job finishes it disappears.

The best way to hide latency in follow-up submissions is to use the dependency system provided by PBS. You can submit jobs with -W depend=afterok:12345 (where 12345 is the job id of the dependency) so that PBS knows to start considering the follow-up job as soon as the first finishes successfully - and to delete the follow-up job if something goes wrong.

Thinking about it more, the e-mail solution isn't that great, since e-mail doesn't have any guarantees on delivery timeframes. But what about something like a TCP socket that the jobs connect to as soon as they start and send a message to that effect? You'd only even have as many file descriptors open as jobs starting simultaneously (plus one), so the likelihood of running out is minimal at most. You could even add a trap to your job script to send a similar message on termination so that it runs even if PBS kills the job for some reason (e.g. exceeded walltime or memory). The only problem is that you'd need IP connectivity between the cluster's compute nodes and wherever Bpipe is running (not always possible).

@luuloi, I don't know how to use Bpipe (nor do I have a pipeline to run through it). Would you be able to test ssadedin's suggestion of rebuilding Bpipe wherever you're running it with more logging?

luuloi commented 7 years ago

I am doing it and back to you soon.

ssadedin commented 7 years ago

Thanks for all the discussion on this.

I'm considering two separate actions to try and alleviate this problem:

consolidated job polling (see #213)
detection of command exit based on file based flags (#214)

I think with these two in place the amount of qstat queries should fall by at least an order of magnitude without degrading any essential Bpipe functionality.

One question about it though: does it actually cause less load on the queue manager if qstat queries N job ids in a single command rather than running two qstat invocations? I wouldn't want to do all the work to implement consolidated status querying if it doesn't actually reduce the ultimate load on the queue manager!

ssadedin / bpipe

How to recude the polling frequency on PBSpro? #211