gptune / GPTune

Other
64 stars 18 forks source link

Running GPTune with slurm #11

Closed jaehoonkoo closed 2 years ago

jaehoonkoo commented 2 years ago

Hi,

Can you please take a look at the following?

GPTune is installed using spack. I am trying to run gptune through a slurm script. However, it hangs at the first MLA iteration.

The slurm script is like:

#!/bin/bash
#SBATCH --job-name=1_gptune
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00

module load gcc/9.2.0-r4tyw54
source activate /home/jkoo/.conda/envs/gptune/
cd ~/spack
. share/spack/setup-env.sh 
spack load gptune 
cd /lcrc/project/EE-ECP/jkoo/code/gptune/examples/GPTune-mmm-block
rm -rf gptune.db
mpirun -n 1 python demo.py -nrun 5 -ntask 1 -perfmodel 0 -optimization GPTune 

and it hangs:

0.09075927734375]]
OUTPUT:%f 0.4654837137840879
store_func_eval
problem.constants
None
NSmin:  2
NS:  5
MLA iteration:  0
exec /lcrc/project/EE-ECP/jkoo/code/gptune/GPTune/lcm.py args None nproc 1
LCM spawn time:  3.379835216

Any idea how to solve this? Thank you.

jaehoonkoo commented 2 years ago

@liuyangzhuan @younghyunc,

Changing to 'Model_GPy_LCM' makes things work!

 tid: 0
    t:100.000000 
    Ps  [[2], [99], [1], [39], [20]]
    Os  [[0.40963721656436747], [0.6048918367567865], [0.5335197447922925], [0.439591445832899], [0.36180930655483146]]
    Popt  [20] Oopt  0.36180930655483146 nth  4
tid: 1
    t:200.000000 
    Ps  [[65], [19], [1], [39], [100]]
    Os  [[0.5936419796004477], [0.5709873240695884], [0.9867563595014704], [0.5881904490251465], [0.6543501531632334]]
    Popt  [19] Oopt  0.5709873240695884 nth  1
tid: 2
    t:300.000000 
    Ps  [[49], [91], [100], [12], [99]]
    Os  [[0.6942004363957837], [0.42248335325520425], [0.3579727827188147], [0.5220334144700532], [0.034208860815704734]]
    Popt  [99] Oopt  0.034208860815704734 nth  4
liuyangzhuan commented 2 years ago

Wonderful. I still believe that there is a problem with using openmpi via GPTune, either due to a non-functioning openmpi in spack, or a wrong mpirun used to launch gptune. Let me know if the use of openmpi is still a problem. I'm closing the issue now.

liuyangzhuan commented 2 years ago

Just some extra comments regarding openmpi. To make sure openmpi is working correctly: First, try spack test run gptune, if this finishes correctly, it means the spack openmpi is working Second, to make sure the correct versions of openmpi and other dependencies are used in your own application: copy the run_env.sh generated by spack test (typically at ./opt/spack/XXX/XXX/gptune-2.1.0-XXX/.spack/test) to your application directory, then . run_env.sh and use $MPIRUN to launch GPTune. See https://github.com/gptune/GPTune/blob/master/examples/GPTune-Demo/run_examples.sh for an example.