zhaojinhong/pdsh-ops_tools

+-------------+ | Description | +-------------+ Pdsh is a multithreaded remote shell client which executes commands on multiple remote hosts in parallel. Pdsh can use several different remote shell services, including standard "rsh", Kerberos IV, and ssh.

See the man page in the doc directory for usage information.

+---------------+ | Configuration | +---------------+

Pdsh uses GNU autoconf for configuration. Dynamically loadable modules of each shell service (as well as other features) will be compiled based on configuration. By default, rsh, Kerberos IV, and SDR (for IBM SPs) will be compiled if they exist on the system.

The README.modules file distributed with pdsh contains a description of each module available, as well as its requirements and/or conflicts.

If your system does not support dynamically loadable modules, you may compile modules in statically using the --enable-static-modules option.

To configure in additional feature modules:

./configure [options]

--without-rsh Disable support for BSD rcmd(3) (standard rsh).

--with-ssh Enable support of ssh(1) remote shell service.

--with-machines=/path/to/machines Use a flat file list of machine names for -a instead of genders, nodeattr, or SDRGetObjects.

--with-qshell Enable support for running parallel jobs on the Quadrics Elan interconnect via the qshell service option (-R qsh) and qshell daemon. See README.QsNet for more information.

--with-genders Enable support of a genders database through the genders(3) library. For pdsh's -i option to function properly, the genders database must have alternate node names listed as the value of the "altname" attribute.

--with-dshgroups Enable support of dsh-style group files in ~/.dsh/group/groupname or /etc/dsh/group/groupname. Allows use of -g/-X to target or exclude hosts in dsh group files.

--with-netgroup Enable use of netgroups (via /etc/netgroup or NIS) to build lists of target hosts using -g/-X to include/exclude hosts.

--with-nodeattr=/path/to/nodeattr Enable support of a genders database through the nodeattr(1) command. This is primarily for older systems that do not yet have genders(3) library support. For pdsh's -i option to function properly, the genders database must have alternate node names listed as the value of the "altname" attribute and the nodeattr command must have the -r option available.

--with-nodeupdown Enable support of dynamic elimination of down nodes through the nodeupdown(3) library.

--with-mrsh Enable support of mrsh(1) remote shell service.

--with-mqshell Enable support for runnig parallel jobs on the Quadrics Elan interconnect via the mqshell service option (-R mqsh) and mqshell dameon. Mqshell is identical to qshell but adds munge authentication (authentication used by mrsh).

--with-rms Support running pdsh under RMS allocation.

--with-slurm Support running pdsh under SLURM allocation.

--with-fanout=N Specify default fanout (default is 32).

--with-timeout=N Set default connect timeout (default is 10 seconds).

--with-readline Use the GNU readline library to parse input in interactive mode.

--without-pam Disable PAM from the qshell and mqshell daemons. By default, they are enabled.

Note that a number of the above configurations options may "conflict" with each other because they perform identical operations. For example, genders and nodeattr both support the -g option. If several modules are installed that support identical options, the options will default to one particular module. Static compilation of modules will fail if conflicting modules are selected. See the man page in this directory for details on which modules conflict.

+------------+ | INSTALLING | +------------+ make make install

By default, pdsh is now installed without setuid permissions. This is because, for the majority of the rcmd connect protocols, root permissions are not necessarily needed. If you are using either of the "rcmd/rsh" or "rcmd/qsh" modules, you will need to change the permissions of pdsh and pdcp to be setuid root after the install. For example:

chown root PREFIX/bin/pdsh PREFIX/bin/pdcp chmod 4755 PREFIX/bin/pdsh PREFIX/bin/pdcp

If you compile the qshell and/or mqshell with PAM support, remember to update your PAM configuration files to support the "qshell" and/or "mqshell" service names. There are sample xinetd(8) config files for qshd and mqshd in the etc/ directory. Also be sure read the README.QsNet file in this directory.

+---------+ | GOTCHAS | +---------+

Watch out for the following gotchas:

1) When executing remote commands via rsh, krb4, qsh, or ssh, pdsh uses one reserved socket for each active connection, two if it is maintaining a separate connection for stderr. It obtains these sockets by calling rresvport(), which normally draws from a pool of 256 sockets. You may exhaust these if multiple pdsh's are running simultanously on a machine, or if the fanout is set too high. Mrsh and mqsh do not use reserved ports, and therefore are not affected this problem as severely.

2) When pdsh is using a remote shell service that is wrapped with TCP wrappers, there are three areas where bottlenecks can be created: IDENT, DNS, and SYSLOG. If your hosts.allow includes "user@", e.g. "in.rshd : ALL@ALL : ALLOW" and TCP wrappers is configured to support IDENT, each simultaneous remote shell connection will result in an IDENT query back to the source. For large fanouts this can quickly overwhelm the source. Similarly, if TCP wrappers is configured to query the DNS on every connection, pdsh may overwhelm the DNS server. Finally, if every remote shell connection results in a remote syslog entry, syslogd on your loghost may be overwhelmed and logs may grow excessively long.

If local security policy permits, consider configuring TCP wrappers to avoid calling IDENT, DNS, or SYSLOG on every remote shell connection. Configuring without the "PARANOID" option (which requires all connections to be registered in the DNS), permitting a simple list of IP addresses or a subnet (no names, and no user@ prefix), and setting the SYSLOG severity for the remote shell service to a level that is not remotely logged will avoid these pitfalls. If these actions are not possible, you may wish to reduce pdsh's default fanout (configure --with-fanout=N).

+---------------------+ | THEORY OF OPERATION | +---------------------+ We will generalize for the common remote shell service rsh. The following is similar for all other shell services (ssh, krb4, qsh, etc.), but other shell services may include additional security or features.

A thread is created for each rsh connection to a node. Each thread opens a connection using an MT-safe rcmd-like function, returns stdin and stderr streams, then terminates.

The mainline starts fanout number of rsh threads and waits on a condition variable that is signalled by the rsh threads as they terminate. When the condition variable is signalled, the main thread starts a new rsh thread to maintain the fanout, until all remote commands have been executed.

A timeout thread is created that monitors the state of the threads and terminates any that take too much time connecting or, if requested on the command line, take too long to complete.

Typing ^C causes pdsh to list threads that are in the connected state. Another ^C immediately following the first one terminates the program.

+--------+ | AUTHOR | +--------+ Jim Garlick garlick@llnl.gov

Please send suggestions, bug reports, or just a note letting me know that you are using pdsh (it would be interesting to hear how many nodes are in your cluster).

+------+ | NOTE | +------+ This product includes software developed by the University of California, Berkeley and its contributors. Modifications have been made and bugs are probably mine.

The PDSH software package has no affiliation with the Democratic Party of Albania (www.pdsh.org).

zhaojinhong / pdsh-ops_tools

readme