soravux / scoop

SCOOP (Scalable COncurrent Operations in Python)
https://github.com/soravux/scoop
GNU Lesser General Public License v3.0
636 stars 87 forks source link

If there any possibility to support IBM LSF system? #71

Closed crazyzlj closed 6 years ago

crazyzlj commented 6 years ago

Hi, I have developed my algorithm using DEAP with SCOOP and tested on our little cluster using PBS system. Now, I want to run on our big cluster which installed IBM LSF. Since I cannot specify the accessible hosts, I found SCOOP cannot detect the LSF system. Then, I found the following instruction.

SCOOP natively supports Sun Grid Engine (SGE), Torque (PBS-compatible, Moab, Maui) and SLURM. That means that a minimum launch file is needed while the framework recognizes automatically the nodes assigned to your task.

So, I wonder if there any possibility to support IBM LSF system?

Thank you very much!

crazyzlj commented 6 years ago

After some try-and-trial, finally, I find out the solution. The key point is creating hostfile dynamically. Here is the step-by-step and hope it helpful for others.

source ~/.bash_profile

Path to executable

cd /wps/home/zhulj/demo/scoop_test

Add local python installation paht

export PATH=/home/zhulj/soft/python-2.7.13/bin:$PATH

start a new host file from scratch

SCOOPHOSTFILE=hosts$LSB_JOBID rm -f $SCOOPHOST_FILE touch $SCOOPHOST_FILE echo "# SCOOP hostfile created by LSF on date"

check if we were able to start writing the conf file

if [ -f $SCOOPHOST_FILE ]; then : else echo "$0: can't create $SCOOPHOST_FILE" exit 1 fi HOST="" NUM_PROC="" FLAG="" TOTAL_CPUS=0 for TOKEN in $LSB_MCPU_HOSTS do if [ -z "$FLAG" ]; then # -z means string is empty HOST="$TOKEN" FLAG="0" else NUM_PROC="$TOKEN" FLAG="1" fi if [ "$FLAG" = "1" ]; then _x=0 if [ $_x -lt $NUM_PROC ]; then TOTAL_CPUS=expr "$TOTAL_CPUS" + "$NUM_PROC" echo "$HOST $NUM_PROC" >> $SCOOPHOST_FILE fi

get ready for the next host

        FLAG=""
        HOST=""
        NUM_PROC=""
    fi
done

echo "Your SCOOP boot hostfile looks like:" echo "TOTAL_CPUS: ${TOTAL_CPUS}"

Python script

script=onemax_island_scoop.py

SCOOP command

python -m scoop --debug --hostfile $SCOOPHOSTFILE $script > testscoop$LSB_JOBID.stdout.log

+ cd to the directory of  `deap_scoop_lsf_demo.lsf`.
+ Submit job by `bsub` with the specification of processor number.
  ```shell
  $ bsub -n 48 ./deap_scoop_lsf_demo.lsf

The hostfile created is something like this. Be caution, no other lines should be existed in this file.

  b10n07.cluster.com 12
  b06n03.cluster.com 12
  b07n12.cluster.com 8
  b07n06.cluster.com 12
  b07n03.cluster.com 4
louisabraham commented 6 years ago

It would still be great to support LSF! Can this issue be reopened?

crazyzlj commented 6 years ago

It would still be great to support LSF! Can this issue be reopened?

Currently, the above solution I posted can work. It would be great if SCOOP can support LSF natively.