pydoit / doit

task management & automation tool
http://pydoit.org
MIT License
1.82k stars 173 forks source link

Support of LSF or GRID? #415

Open texan4ever opened 2 years ago

texan4ever commented 2 years ago

I found DOIT while looking for alternatives to Make. (find the Make syntax to be extremely cryptic and difficult to enhance 6 months later). I am really enjoying DOIT. Powerful and easy to read.

However some of my usage involves tasks that can easily run for days and have to be run on a CPU farm.

What are the chances of DOIT being enhanced to submit the task to a CPU farm that is controlled by LSF or GRID.

Fund with Polar

schettino72 commented 2 years ago

That's feasible.

First step would be to have some sample code to work on. Some example computation + pipeline you would submit to a CPU farm. From that we can discuss about doit interface and implementation.

Can you provide some sample code of how you usually use a CPU farm?

texan4ever commented 2 years ago

Thanks for responding back.

LSF and GRID have commands used to submit jobs to the farm. The user would most likely need to specify the following global information at the top of the dodo.py file:

Memory and CPU resources:

The challenging part would be determining when a job completes. qsub returns a 0/1 if the jobs is successfully submitted and returns the jobs number to stdout: Your job 1285583 ("") has been submitted

As a user I am able to run qstat to see what jobs are running. Or I can run qstat -j to see the status of a job (but that gives a very verbose output). So there most likely would need to be a way to specify how often the the queue is polled by doit.

Output of qstat: job-ID prior job name user state submit/start at 1285669 0.47662 r 01/31/2022 16:19:16

Two example for Oracle GRID engine:

Example1: submit runfoo.csh and request 5gig of ram and 8 CPU's qsub -N 'name of job'-P name-of-queue' -cwd -V -pe mt 16 -o stdout.log -l 'os_bit=64,mem_free=50G' runfoo.csh

Example 2: submit the linux command "tar cvfz file.tgz somedirectory". Request 1gig of ram and use default of 1CPU NOTE: multiple -l options can be used qsub -N 'name of job'-P 'name of queue' -cwd -V -l 'os_bit=64' -l 'mem_free=1G' "tar cvfz file.tgz somedirectory > tar.log"