zhaojinhong / pdsh-ops_tools

Automatically exported from code.google.com/p/pdsh
GNU General Public License v2.0
0 stars 0 forks source link

Add torque module #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Hi!
Since we're still using Torque as resource manager on the compute clusters at 
our site, I created a torque-module (similar to the slurm-module; in fact, I 
modified the slurm-module to instead talk to torque/pbs-server).
Is it something that you would be interested in including in pdsh?
The module is not thoroughly tested yet, but afaik it works at my place. 
Autotools and spec-stuff also seems to work (on CentOS5).

/Mattias

Original issue reported on code.google.com by mark.gro...@gmail.com on 10 Sep 2010 at 4:32

GoogleCodeExporter commented 9 years ago
Hi, the patch should apply to pdsh-2.22, and after ./bootstrap, it seems to 
work.

[c3-slaba@beda-s1 pdsh-2.22]$ patch -p1 < ../with-torque.patch 
patching file config/ac_torque.m4
patching file configure.ac
patching file doc/pdsh.1.in
patching file pdsh.spec
patching file src/modules/Makefile.am
patching file src/modules/torque.c

I've done some initial testing on el4+torque-2.1 and el5+torque2.3. Pdsh-rpms 
were created with "rpmbuild --with torque ...". 

There are a few question marks that I can come to think of.
* In Makefile.am, would it perhaps be more appropriate to explicitly add a 
AM_CPPFLAGS at the beginning of the file, and then change the "AM_CPPFLAGS =", 
into "AM_CPPFLAGS +="?
* mod-slurm and mod-torque share the -j-option. Should there perhaps be some 
mechanism to make mod-slurm and mod-torque mutually exclusive, or at least make 
the behavior deterministic if you are using slurm and torque simultaneously? 
Now the modules have the same priority.
* I dropped the "-j all" special option in mod-torque. Probably because I 
didn't consider it useful to me.
* With the current version, jobids picked up from the PBS_JOBID environment 
variable are never sanitized, but rather just sent directly to libtorque.
* Queued jobs, not yet running (i.e., there are no reserved compute nodes) are 
ignored silently.

Original comment by don.fanucci on 10 Sep 2010 at 7:18

Attachments:

GoogleCodeExporter commented 9 years ago
Oh, and torque.c:482 should be converted from a fprintf(stderr into an errx, I 
guess.

Original comment by don.fanucci on 10 Sep 2010 at 7:27

GoogleCodeExporter commented 9 years ago
Thanks, I will take a look at your patch today.
In reference to your queries:

 1. I'll check out AM_CPPFLAGS in Makefile.am. I haven't looked at that file in a long time.

 2. Whenever any two modules supply the same option to pdsh,
    the first one loaded "wins" and the second fails.
    As of pdsh-2.22, modules are loaded in strcmp() order,
    so the 'slurm' module will always be loaded if both the slurm
    and torque modules are present. As of pdsh-2.21, there is now
    a '-M module' option that will force load one module over the
    other, so you would have to do 'pdsh -M torque -j JOBID ...'
    Ideally, you would only install the slurm module on slurm clusters
    and the torque/pbs module on Torque clusters... but in the future
    pdsh will hopefully have support for a config file that could automatically
    determin which module to load.

  3. (-j all option dropped) That is fine. I added -j all to the slurm module
    because there were many cases where it was useful, e.g. running a command
    on all nodes running slurm jobs looking for a particular problem, etc.
    If someone requests -j all, I'm sure it will be easy to add.

  Your last two comments from Comment 1 seem fine to me... 

Original comment by mark.gro...@gmail.com on 10 Sep 2010 at 8:38

GoogleCodeExporter commented 9 years ago
> In Makefile.am, would it perhaps be more appropriate
> to explicitly add a AM_CPPFLAGS at the beginning of the file, and then change 
the 
> "AM_CPPFLAGS =", into "AM_CPPFLAGS +="?

Yes, after looking at this I'm not sure why AM_CPPFLAGS wasn't used in the way 
you
suggest. However, I think perhaps the better solution is to use per-target 
CPPFLAGS
which I think are supported via target_CPPFALGS, so e.g.

 torque_la_CPPFLAGS = $(TORQUE_CPPFLAGS)

mark

Original comment by mark.gro...@gmail.com on 10 Sep 2010 at 10:17

GoogleCodeExporter commented 9 years ago
Ok, I've come up with the following 4 extra (minor) patches that apply on top
of your torque patch. Please give them a review and if they look ok to you, I'll
push your torque module into the trunk (with these patches).

Original comment by mark.gro...@gmail.com on 10 Sep 2010 at 10:29

Attachments:

GoogleCodeExporter commented 9 years ago
The four patches looks good! (I also applied them, built a new set of RPMs, and 
did a little bit of testing).

Have a nice weekend! :)

/Mattias

Original comment by don.fanucci on 11 Sep 2010 at 8:11

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r1231.

Original comment by mark.gro...@gmail.com on 13 Sep 2010 at 5:35