bpipe HEAD (with sge_fixes) complains about local concurrency when running sge jobs

gdevenyi commented 8 years ago

I ran a small number of subjects (6) on an sge cluster config in bpipe, I also have concurrency requirements set in the local pipeline (uses threads).

The pipeline did not run the stage complaining: ERROR: Concurrency required to execute stage beastmask is 8, which is greater than the maximum configured for this pipeline (6). Use the -n flag to allow higher concurrency.

But -n specifies the number of jobs to the cluster, not the number of procs in this case.

gdevenyi commented 8 years ago

Followup:

Related to this is the relationship between the "procs" specification in bpipe.config and the the "-n" parameter on the command line.

I found that when I have -n40 (because I want to run 40 subjects in parallel through my pipeline) when I get to a stage which has procs=4 specified, I only get 10 jobs submitted to the cluster at a time.

I understand threads/procs should be connected on a local run, but on a cluster run, these are decoupled.

ssadedin commented 8 years ago

Just following this up ... yes, the -n flag is currently a bit over simplified because it affects two different things that can sometimes be independent:

the number of concurrent jobs
the resources the jobs use

Currently -n controls both of them - that is, if you specify -n 40, bpipe won't run more than 40 jobs concurrently (even if you specified that the jobs themselves use no threads). On the other hand, it also won't consume more than 40 threads (even if you are just running 1 job). Perhaps we should break these things into separate flags -

-cores => total number of cores (resource limit)
-jobs => total number of allowed parallel jobs

In practice there are some reasons to limit the job concurrency. For example, maybe you want to be polite to the cluster and not have hundreds of jobs waiting in the queue. Or you may be concerned about the resource consumption of Bpipe itself - internally it uses a thread and a few file handles for each concurrent job. Some cluster login nodes have very low limits for these and in those cases Bpipe will run out of file handles if it attempts to submit hundreds or thousands of jobs at the same time.

For your scenario, you'd probably need to set the -n value to the total number of cores available in the cluster. In fact, you might as well set it even higher, because the cluster queuing system is going to do the work of scheduling the jobs - Bpipe should just submit every job to the cluster as soon as it is ready to run and impose no concurrency control of its own.

gdevenyi commented 8 years ago

I indeed regularly run into the resource issues with the bpipe, so I need to limit it's threads quite a bit below what my cluster can support.

I'd be very much in favour of decoupling -n to allow for better specification of local vs cluster jobs.

gdevenyi commented 8 years ago

Hi,

I've recently run into some further issues with the "-n" confusion. I've implemented a system-load watchdog for the machines to keep an eye on crashes, but now, if I specify "-n" as the size of the cluster (~300 slots), bpipe spikes the local load since it also launches 300 processes, and the watchdog daemon triggers it's over-load condition and restarts the machine.

ssadedin commented 8 years ago

I agree about the need to decouple these. In the cluster environment it definitely becomes a problem for Bpipe to use so many threads. They should (nearly) all be idle, but just launching them at all can be a problem. Unfortunately it is fairly deep in the design at the moment that Bpipe is using a thread-per-parallel-branch. Moving to an asynchronous model (small thread pool asynchronously controlling and servicing large pool of parallel segments) is something I want to do - data is getting bigger and bigger, along with sample sizes, and cloud computing facilities mean that it's not out of the question that people rent clusters with thousands of nodes for an hour to get a job done fast. Bpipe should be able to scale to that without using tens of thousands of threads. However it's clearly one of those "bigger things" that will require some internal restructuring to get there.

gdevenyi commented 8 years ago

Ah, I see. That is indeed a big restructuring.

For now, I've greatly relaxed my short-term load average tests which seems to have resolved the startup issue.

ssadedin / bpipe

bpipe HEAD (with sge_fixes) complains about local concurrency when running sge jobs #165