ucsf-wynton / wynton-website-hpc

The Official Wynton HPC User Website
https://wynton.ucsf.edu/hpc/
2 stars 14 forks source link

SGE: Add qrsh example how to launch multi-host subprocesses #110

Open HenrikBengtsson opened 1 year ago

HenrikBengtsson commented 1 year ago

A user said in an email:

My code doesn't actually use openMPI for communication (and none is needed for single gpu jobs), the only reason I use mpirun is because it's the only way (afaik) to invoke the same job across multiple allocated nodes on SGE. ...

Coincidentally, a few weeks ago, I figured out how to launch mult-host subprocesses using qrsh instead of mpirun. Here's an example - it would be nice to be able to simplify it more:

#!/bin/env bash
#$ -S /bin/bash
#$ -cwd
#$ -j y

#' Reads PE_HOSTFILE and returns an array of hostnames, where each
#' hostname is repeated the number of times per second column.
#' For example,
#'
#' opt88 3 short.q@opt88 UNDEFINED
#' iq242 2 short.q@iq242 UNDEFINED
#' opt116 1 short.q@opt116 UNDEFINED
#'
#' returns array (opt88 opt88 opt88 iq242 iq242 opt116)
read_pe_hostfile_expanded() {
    local -a hosts rows args
    local row kk

    [[ -n "$PE_HOSTFILE" ]] || { 2>&1 echo "ERROR: Environment variable 'PE_HOSTFILE' is not set"; exit 1; }
    [[ -f "$PE_HOSTFILE" ]] || { 2>&1 echo "ERROR: No such file: ${PE_HOSTFILE}"; exit 1; }

    ## Parse PE_HOSTFILE file
    mapfile -t rows < <(cat "$PE_HOSTFILE")

    for row in "${rows[@]}"; do
        read -r -a args <<< "${row}"
        # shellcheck disable=SC2034
        for kk in $(seq "${args[1]}"); do
            hosts+=("${args[0]}")
        done
    done

    echo "${hosts[@]}"
}

## Parse PE_HOSTFILE into a Bash array, where each hostname is repeated 
## the number of times corresponding to number of slots on that host
read -r -a hosts < <(read_pe_hostfile_expanded)

## Command to run on each "worker"
cmd='echo "begin"; hostname; date; echo "done"'

## Launch command on each worker in the PE_HOSTFILE
for host in "${hosts[@]}"; do
  qrsh -inherit -nostdin -V "${host}" "$cmd"
done

## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"  # This is useful for debugging and usage purposes,
                                          # e.g. "did my job exceed its memory request?"
yala commented 1 year ago

@HenrikBengtsson This seems to run the $cmd serially on each of the hosts instead of in parallel

Do you know of an an easy way to run these in parallel? I tried running the $cmd (which i replaced with my python job) in background but it looks like it gets killed immediatly (assuming as exit of qrsh?)

HenrikBengtsson commented 1 year ago

This seems to run the $cmd serially on each of the hosts instead of in parallel

Oh... yes, you're right.

Do you know of an an easy way to run these in parallel? ...

We can use standard shell tools for this, i.e. & and wait. Callingsomecmd &will runsomecmdin the background, andwait` will wait for all such tasks to complete. Here is an updated version:

#!/bin/env bash
#$ -S /bin/bash
#$ -cwd
#$ -j y

echo "Call: $0 ..."
echo "Script name: $(basename "${BASH_SOURCE[0]}")"
echo "Arguments: $*"
echo "PID: ${PID}"

module load CBI r
Rscript demo_pe_mpi_qrsh.R

#' Reads PE_HOSTFILE and returns an array of hostnames, where each
#' hostname is repeated the number of times per second column.
#' For example,
#'
#' opt88 3 short.q@opt88 UNDEFINED
#' iq242 2 short.q@iq242 UNDEFINED
#' opt116 1 short.q@opt116 UNDEFINED
#'
#' returns array (opt88 opt88 opt88 iq242 iq242 opt116)
read_pe_hostfile_expanded() {
    local -a hosts rows args
    local row kk

    [[ -n "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: Environment variable 'PE_HOSTFILE' is not set"; exit 1; }
    [[ -f "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: No such file: ${PE_HOSTFILE}"; exit 1; }

    ## Parse PE_HOSTFILE file
    mapfile -t rows < <(cat "$PE_HOSTFILE")

    for row in "${rows[@]}"; do
        read -r -a args <<< "${row}"
        # shellcheck disable=SC2034
        for kk in $(seq "${args[1]}"); do
            hosts+=("${args[0]}")
        done
    done

    echo "${hosts[@]}"
}

read -r -a hosts < <(read_pe_hostfile_expanded)
#echo "hosts=${hosts[*]}"
#echo "nhosts=${#hosts[@]}"

cmd='echo "begin"; hostname; date; echo "done"'

echo "Launching ${#hosts[@]} parallel tasks ..."
echo " - task: $cmd"
for host in "${hosts[@]}"; do
  echo "- launch: qrsh -inherit -nostdin -V ${host} \"$cmd\" &"
  qrsh -inherit -nostdin -V "${host}" "$cmd" &
done
echo "Launching ${#hosts[@]} parallel tasks ... done"

## Wait for all tasks to complete
echo "Waiting for ${#hosts[@]} parallel tasks to complete ..."
wait
echo "Waiting for ${#hosts[@]} parallel tasks to complete ... done"

## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"  # This is useful for debugging and usage purposes,
                                          # e.g. "did my job exceed its memory request?"

echo "Call: $0 ... done"

It's probably useful to put all that into a new shell function qrshrun to make it neater. I'll do that next.

HenrikBengtsson commented 1 year ago

Here's the version with a qrsh_run function to better clarify how it works:

#!/bin/env bash
#$ -S /bin/bash
#$ -cwd
#$ -j y

#-----------------------------------------------------------------
# SGE utility functions
#-----------------------------------------------------------------
sge_debug() {
    ${SGE_DEBUG:-false} && >&2 echo "$@"
}

#' Reads PE_HOSTFILE and returns an array of hostnames, where each
#' hostname is repeated the number of times per second column.
#' For example,
#'
#' opt88 3 short.q@opt88 UNDEFINED
#' iq242 2 short.q@iq242 UNDEFINED
#' opt116 1 short.q@opt116 UNDEFINED
#'
#' returns array (opt88 opt88 opt88 iq242 iq242 opt116)
read_pe_hostfile_expanded() {
    local -a hosts rows args
    local row kk

    [[ -n "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: Environment variable 'PE_HOSTFILE' is not set"; exit 1; }
    [[ -f "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: No such file: ${PE_HOSTFILE}"; exit 1; }

    ## Parse PE_HOSTFILE file
    mapfile -t rows < <(cat "$PE_HOSTFILE")

    for row in "${rows[@]}"; do
        read -r -a args <<< "${row}"
        # shellcheck disable=SC2034
        for kk in $(seq "${args[1]}"); do
            hosts+=("${args[0]}")
        done
    done

    echo "${hosts[@]}"
}

#' Calls a command on parallel workers allotted by SGE
#'
#' This function identifies the parallel workers that SGE has
#' given to the current job by parsing the file given by the
#' 'PE_HOSTFILE' environment variable.  It then uses:
#'
#'   qrsh -inherit -nostdin -V <worker-hostname> <command>
#'
#' to launch the <command> on each parallel worker.
#'
#' Example:
#' qrsh_run 'echo "begin"; hostname; date; echo "done"'
qrsh_run() {
    local -a hosts
    read -r -a hosts < <(read_pe_hostfile_expanded)

    ## Nothing to do?
    [[ ${#hosts[@]} == 0 ]] && return 0

    sge_debug "Launching ${#hosts[@]} parallel tasks ..."
    sge_debug " - task: $*"
    for host in "${hosts[@]}"; do
        sge_debug "- launch: qrsh -inherit -nostdin -V ${host} \"$*\" &"
        qrsh -inherit -nostdin -V "${host}" "$@" &
    done
    sge_debug "Launching ${#hosts[@]} parallel tasks ... done"

    ## Wait for all tasks to complete
    sge_debug "Waiting for ${#hosts[@]} parallel tasks to complete ..."
    wait
    sge_debug "Waiting for ${#hosts[@]} parallel tasks to complete ... done"
}    

#-----------------------------------------------------------------
# Main script
#-----------------------------------------------------------------
echo "Call: $0 ..."
echo "Script name: $(basename "${BASH_SOURCE[0]}")"
echo "Arguments: $*"
echo "PPID: ${PPID}"

## Launch command on all parallel workers allotted by SGE
qrsh_run 'echo "begin"; hostname; date; echo "done"'

## Launch another set of parallel tasks after the above have completed
qrsh_run 'echo "begin 2nd round"; hostname; date; echo "done"'

## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"  # This is useful for debugging and usage purposes,
                                          # e.g. "did my job exceed its memory request?"

echo "Call: $0 ... done"