robertfeldt / BlackBoxOptim.jl

Black-box optimization for Julia
Other
439 stars 56 forks source link

parallal eval does not use all available workers #233

Open floswald opened 7 months ago

floswald commented 7 months ago

hello,

I find that I almost never am able to take advantage of all available workers on a cluster. Here I have 20 workers, but the xnes algorithm only ever uses 13. I also tried with dxnes but that uses 14 out of 20. Is there anything in the algo that determines how many workers to use?

i tried this:

[oswald@stella01 git]$ git clone git@github.com:robertfeldt/BlackBoxOptim.jl.git
Cloning into 'BlackBoxOptim.jl'...
Warning: the ECDSA host key for 'github.com' differs from the key for the IP address '140.82.121.4'
Offending key for IP in /gpfs/users/oswald/.ssh/known_hosts:1
Matching host key in /gpfs/users/oswald/.ssh/known_hosts:9
remote: Enumerating objects: 7111, done.
remote: Counting objects: 100% (351/351), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 7111 (delta 187), reused 306 (delta 165), pack-reused 6760
Receiving objects: 100% (7111/7111), 2.01 MiB | 2.05 MiB/s, done.
Resolving deltas: 100% (5031/5031), done.
[oswald@stella01 git]$ vim BlackBoxOptim.jl/examples/rosenbrock_parallel.jl 

[oswald@stella01 BlackBoxOptim.jl]$ cat examples/rosenbrock_parallel.jl 
using Distributed

# Now add 2 procs that can exec in parallel (obviously it depends on your CPU
# what you actually gain from this though)

    addprocs(20, exeflags = "--project=.")
    @everywhere using Pkg
    @everywhere Pkg.instantiate()

# Ensure BlackBoxOptim loaded on all workers
@everywhere using BlackBoxOptim

# define the function to optimize on all workers. Parallel eval only gives a gain
# if function to optimize is slow. For this example we introduce a fake sleep
# to make it slow since the function is actually very quick to eval...
@everywhere function slow_rosenbrock(x)
  sleep(1) # Fake a slower func to be optimized...
    println("evaluation on worker")
  return BlackBoxOptim.rosenbrock(x)
end

# First run without any parallel procs used in eval
#opt1 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
#               NumDimensions = 50, MaxFuncEvals = 5000)
#el1 = @elapsed res1 = bboptimize(opt1)
#t1 = round(el1, digits=3)

# When Workers= option is given, BlackBoxOptim enables parallel
# evaluation of fitness using the specified worker processes
opt2 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
               NumDimensions = 50, MaxFuncEvals = 40, Workers = workers())
el2 = @elapsed res2 = bboptimize(opt2)
t2 = round(el2, digits=3)

#println("Time: serial = $(t1)s, parallel = $(t2)s")
#if t2 < t1
#  println("Speedup is $(round(t1/t2, digits=1))x")
#else
#  println("Slowdown is $(round(t2/t1, digits=1))x")
#end

[oswald@stella01 BlackBoxOptim.jl]$ cat dante.run
#!/bin/bash 
#SBATCH --job-name=rosenbrock
#SBATCH --output=rosen.out
#SBATCH --error=rosen.err
#SBATCH --partition=ncpushort
#SBATCH --nodes=1
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=1G   # memory per cpu-core

julia --project=. -e 'using Pkg; Pkg.instantiate(); include("examples/rosenbrock_parallel.jl")'

[oswald@stella01 BlackBoxOptim.jl]$ sbatch dante.run
Submitted batch job 1030059

[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.err
The latest version of Julia in the `release` channel is 1.10.2+0.x64.linux.gnu. You currently have `1.8.5+0.x64.linux.gnu` installed. Run:

  juliaup update

in your terminal shell to install Julia 1.10.2+0.x64.linux.gnu and update the `release` channel to that version.
[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
evaluation on worker
Starting optimization with optimizer XNESOpt{Float64, RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
sigma=1.0 |trace(ln_B)|=0.0
      From worker 3:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 14:   evaluation on worker
      From worker 11:   evaluation on worker
      From worker 2:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 13:   evaluation on worker
      From worker 7:    evaluation on worker
      From worker 12:   evaluation on worker
      From worker 8:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
1.84 secs, 13 evals, 1 steps, fitness=576669.977892522
sigma=0.9999706009697786 |trace(ln_B)|=-5.854691731421724e-18
      From worker 3:    evaluation on worker
      From worker 2:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 11:   evaluation on worker
      From worker 7:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 8:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 13:   evaluation on worker
      From worker 14:   evaluation on worker
3.11 secs, 26 evals, 2 steps, fitness=576669.977892522
sigma=0.9996374110028836 |trace(ln_B)|=6.071532165918825e-18
      From worker 2:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 3:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 7:    evaluation on worker
      From worker 8:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 11:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 13:   evaluation on worker
      From worker 14:   evaluation on worker
4.32 secs, 39 evals, 3 steps, fitness=462136.298735016
sigma=0.9995845644566158 |trace(ln_B)|=1.5395670849294163e-17
      From worker 2:    evaluation on worker
      From worker 3:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 7:    evaluation on worker
      From worker 11:   evaluation on worker
      From worker 8:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 13:   evaluation on worker
      From worker 14:   evaluation on worker

Optimization stopped after 4 steps and 5.48 seconds
Termination reason: Max number of function evaluations (40) reached
Steps per second = 0.73
Function evals per second = 9.49
Improvements/step = NaN
Total function evaluations = 52

Best candidate found: [2.07323, -3.68329, 1.03099, -4.62028, 3.07159, 2.03989, -3.13307, -2.52309, 2.80827, 1.75317, 3.69806, 3.58054, -0.784089, -1.84108, 2.82848, 3.37218, -4.50288, -2.44541, 2.78737, 1.16678, -2.65538, 1.41732, 2.56764, -1.3611, -1.04387, 0.277611, -3.16395, -2.97312, 1.31683, -1.63275, 1.54157, 1.3299, 0.310397, 1.72371, -4.03617, 0.267421, -2.4304, -0.323388, -4.91322, 2.67858, -0.843714, -4.12674, 0.791436, 0.942429, -3.72488, 1.89244, -0.744755, 1.45, -1.49314, -2.10453]

Fitness: 381745.415040534

[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.err
The latest version of Julia in the `release` channel is 1.10.2+0.x64.linux.gnu. You currently have `1.8.5+0.x64.linux.gnu` installed. Run:

  juliaup update

in your terminal shell to install Julia 1.10.2+0.x64.linux.gnu and update the `release` channel to that version.
[oswald@stella01 BlackBoxOptim.jl]$ 
robertfeldt commented 7 months ago

Hmm, long time since working with that code so not 100% sure here but there is a parameter that decides how many samples are done for each "round" of the NES algorithms. Probably that one might limit how many parallel executions are done. Not fully sure if it's the lambda parameter though, maybe you can try that and later today I can remind myself and see if it would really have an effect.

floswald commented 7 months ago

bingo! 🎉 thanks for the super fast help.

for reference

[oswald@stella01 BlackBoxOptim.jl]$ cat examples/rosenbrock_parallel.jl 
using Distributed

# Now add 2 procs that can exec in parallel (obviously it depends on your CPU
# what you actually gain from this though)

    addprocs(20, exeflags = "--project=.")
    @everywhere using Pkg
    @everywhere Pkg.instantiate()

# Ensure BlackBoxOptim loaded on all workers
@everywhere using BlackBoxOptim

# define the function to optimize on all workers. Parallel eval only gives a gain
# if function to optimize is slow. For this example we introduce a fake sleep
# to make it slow since the function is actually very quick to eval...
@everywhere function slow_rosenbrock(x)
  sleep(1) # Fake a slower func to be optimized...
    println("evaluation on worker")
  return BlackBoxOptim.rosenbrock(x)
end

# First run without any parallel procs used in eval
#opt1 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
#               NumDimensions = 50, MaxFuncEvals = 5000)
#el1 = @elapsed res1 = bboptimize(opt1)
#t1 = round(el1, digits=3)

# When Workers= option is given, BlackBoxOptim enables parallel
# evaluation of fitness using the specified worker processes
opt2 = bbsetup(slow_rosenbrock; Method=:dxnes, SearchRange = (-5.0, 5.0),
               NumDimensions = 50, MaxFuncEvals = 40, Workers = workers(), lambda = 20)
el2 = @elapsed res2 = bboptimize(opt2)
t2 = round(el2, digits=3)

#println("Time: serial = $(t1)s, parallel = $(t2)s")
#if t2 < t1
#  println("Speedup is $(round(t1/t2, digits=1))x")
#else
#  println("Slowdown is $(round(t2/t1, digits=1))x")
#end

uses in fact 20 workers:

[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
evaluation on worker
Starting optimization with optimizer BlackBoxOptim.DXNESOpt{Float64, RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
σ=1.0 η[x]=1.0 η[σ]=0.0 η[B]=0.0 |tr(ln_B)|=0.0 |path|=NaN speed=NaN
      From worker 13:   evaluation on worker
      From worker 19:   evaluation on worker
      From worker 15:   evaluation on worker
      From worker 21:   evaluation on worker
      From worker 11:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 18:   evaluation on worker
      From worker 2:    evaluation on worker
      From worker 16:   evaluation on worker
      From worker 20:   evaluation on worker
      From worker 17:   evaluation on worker
      From worker 14:   evaluation on worker
      From worker 10:   evaluation on worker
      From worker 4:    evaluation on worker
      From worker 8:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 7:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 3:    evaluation on worker
2.34 secs, 20 evals, 1 steps, fitness=520500.861006177
σ=0.9660885717136757 η[x]=1.0 η[σ]=1.1666666666666667 η[B]=0.00390625 |tr(ln_B)|=1.2739375526704677e-18 |path|=NaN speed=NaN
      From worker 2:    evaluation on worker