rescalante-lilly / ruffus

Automatically exported from code.google.com/p/ruffus
MIT License
0 stars 0 forks source link

@jobs_limit does not override multiprocessing limit #75

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I want to allow tasks that do heavy processing on a cluster to run more jobs 
concurrently than those that do their work locally. Here is an incomplete 
snippet to show what I mean:

@jobs_limit(32)
@transform(...)
def heavy_remote_job(input, output):
    ...

@transform(...)
def heavy_local_job(input, output):
    ...

cmdline.run(options, multiprocess=4)

This does not work because it seems like @jobs_limit() only reduces the number 
of jobs.

Original issue reported on code.google.com by ke...@eng.ucsd.edu on 23 Jan 2015 at 8:35

GoogleCodeExporter commented 9 years ago
The following workaround does work, however. Maybe this is better than changing 
the semantics of @jobs_limit, if that would break existing pipelines.

@jobs_limit(32, 'remote_jobs')
@transform(...)
def heavy_remote_job(input, output):
    ...

@jobs_limit(1, 'local_jobs')
@transform(...)
def heavy_local_job(input, output):
    ...

@jobs_limit(1, 'local_jobs')
@transform(...)
def another_local_job(input, output):
    ...

pipeline_run(multiprocess=32)

Original comment by ke...@eng.ucsd.edu on 23 Jan 2015 at 9:26

GoogleCodeExporter commented 9 years ago
I should have made it clear that the multiprocess or multithread parameter sets 
the global parallel limit for your script which you can only reduce for 
particular tasks. 

Alas, for the sake of clarity, this is a "feature" not a bug :-)

Original comment by bunbu...@gmail.com on 30 Jan 2015 at 7:00