parallal eval does not use all available workers

hello,

I find that I almost never am able to take advantage of all available workers on a cluster. Here I have 20 workers, but the xnes algorithm only ever uses 13. I also tried with dxnes but that uses 14 out of 20. Is there anything in the algo that determines how many workers to use?

i tried this:

[oswald@stella01 git]$ git clone git@github.com:robertfeldt/BlackBoxOptim.jl.git
Cloning into 'BlackBoxOptim.jl'...
Warning: the ECDSA host key for 'github.com' differs from the key for the IP address '140.82.121.4'
Offending key for IP in /gpfs/users/oswald/.ssh/known_hosts:1
Matching host key in /gpfs/users/oswald/.ssh/known_hosts:9
remote: Enumerating objects: 7111, done.
remote: Counting objects: 100% (351/351), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 7111 (delta 187), reused 306 (delta 165), pack-reused 6760
Receiving objects: 100% (7111/7111), 2.01 MiB | 2.05 MiB/s, done.
Resolving deltas: 100% (5031/5031), done.
[oswald@stella01 git]$ vim BlackBoxOptim.jl/examples/rosenbrock_parallel.jl 

[oswald@stella01 BlackBoxOptim.jl]$ cat examples/rosenbrock_parallel.jl 
using Distributed

# Now add 2 procs that can exec in parallel (obviously it depends on your CPU
# what you actually gain from this though)

    addprocs(20, exeflags = "--project=.")
    @everywhere using Pkg
    @everywhere Pkg.instantiate()

# Ensure BlackBoxOptim loaded on all workers
@everywhere using BlackBoxOptim

# define the function to optimize on all workers. Parallel eval only gives a gain
# if function to optimize is slow. For this example we introduce a fake sleep
# to make it slow since the function is actually very quick to eval...
@everywhere function slow_rosenbrock(x)
  sleep(1) # Fake a slower func to be optimized...
    println("evaluation on worker")
  return BlackBoxOptim.rosenbrock(x)
end

# First run without any parallel procs used in eval
#opt1 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
#               NumDimensions = 50, MaxFuncEvals = 5000)
#el1 = @elapsed res1 = bboptimize(opt1)
#t1 = round(el1, digits=3)

# When Workers= option is given, BlackBoxOptim enables parallel
# evaluation of fitness using the specified worker processes
opt2 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
               NumDimensions = 50, MaxFuncEvals = 40, Workers = workers())
el2 = @elapsed res2 = bboptimize(opt2)
t2 = round(el2, digits=3)

#println("Time: serial = $(t1)s, parallel = $(t2)s")
#if t2 < t1
#  println("Speedup is $(round(t1/t2, digits=1))x")
#else
#  println("Slowdown is $(round(t2/t1, digits=1))x")
#end

[oswald@stella01 BlackBoxOptim.jl]$ cat dante.run
#!/bin/bash 
#SBATCH --job-name=rosenbrock
#SBATCH --output=rosen.out
#SBATCH --error=rosen.err
#SBATCH --partition=ncpushort
#SBATCH --nodes=1
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=1G   # memory per cpu-core

julia --project=. -e 'using Pkg; Pkg.instantiate(); include("examples/rosenbrock_parallel.jl")'

[oswald@stella01 BlackBoxOptim.jl]$ sbatch dante.run
Submitted batch job 1030059

[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.err
The latest version of Julia in the `release` channel is 1.10.2+0.x64.linux.gnu. You currently have `1.8.5+0.x64.linux.gnu` installed. Run:

  juliaup update

in your terminal shell to install Julia 1.10.2+0.x64.linux.gnu and update the `release` channel to that version.
[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
evaluation on worker
Starting optimization with optimizer XNESOpt{Float64, RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
sigma=1.0 |trace(ln_B)|=0.0
      From worker 3:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 14:   evaluation on worker
      From worker 11:   evaluation on worker
      From worker 2:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 13:   evaluation on worker
      From worker 7:    evaluation on worker
      From worker 12:   evaluation on worker
      From worker 8:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
1.84 secs, 13 evals, 1 steps, fitness=576669.977892522
sigma=0.9999706009697786 |trace(ln_B)|=-5.854691731421724e-18
      From worker 3:    evaluation on worker
      From worker 2:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 11:   evaluation on worker
      From worker 7:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 8:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 13:   evaluation on worker
      From worker 14:   evaluation on worker
3.11 secs, 26 evals, 2 steps, fitness=576669.977892522
sigma=0.9996374110028836 |trace(ln_B)|=6.071532165918825e-18
      From worker 2:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 3:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 7:    evaluation on worker
      From worker 8:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 11:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 13:   evaluation on worker
      From worker 14:   evaluation on worker
4.32 secs, 39 evals, 3 steps, fitness=462136.298735016
sigma=0.9995845644566158 |trace(ln_B)|=1.5395670849294163e-17
      From worker 2:    evaluation on worker
      From worker 3:    evaluation on worker
      From worker 4:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 7:    evaluation on worker
      From worker 11:   evaluation on worker
      From worker 8:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 10:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 13:   evaluation on worker
      From worker 14:   evaluation on worker

Optimization stopped after 4 steps and 5.48 seconds
Termination reason: Max number of function evaluations (40) reached
Steps per second = 0.73
Function evals per second = 9.49
Improvements/step = NaN
Total function evaluations = 52

Best candidate found: [2.07323, -3.68329, 1.03099, -4.62028, 3.07159, 2.03989, -3.13307, -2.52309, 2.80827, 1.75317, 3.69806, 3.58054, -0.784089, -1.84108, 2.82848, 3.37218, -4.50288, -2.44541, 2.78737, 1.16678, -2.65538, 1.41732, 2.56764, -1.3611, -1.04387, 0.277611, -3.16395, -2.97312, 1.31683, -1.63275, 1.54157, 1.3299, 0.310397, 1.72371, -4.03617, 0.267421, -2.4304, -0.323388, -4.91322, 2.67858, -0.843714, -4.12674, 0.791436, 0.942429, -3.72488, 1.89244, -0.744755, 1.45, -1.49314, -2.10453]

Fitness: 381745.415040534

[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.err
The latest version of Julia in the `release` channel is 1.10.2+0.x64.linux.gnu. You currently have `1.8.5+0.x64.linux.gnu` installed. Run:

  juliaup update

in your terminal shell to install Julia 1.10.2+0.x64.linux.gnu and update the `release` channel to that version.
[oswald@stella01 BlackBoxOptim.jl]$

bingo! 🎉 thanks for the super fast help.

for reference

[oswald@stella01 BlackBoxOptim.jl]$ cat examples/rosenbrock_parallel.jl 
using Distributed

# Now add 2 procs that can exec in parallel (obviously it depends on your CPU
# what you actually gain from this though)

    addprocs(20, exeflags = "--project=.")
    @everywhere using Pkg
    @everywhere Pkg.instantiate()

# Ensure BlackBoxOptim loaded on all workers
@everywhere using BlackBoxOptim

# define the function to optimize on all workers. Parallel eval only gives a gain
# if function to optimize is slow. For this example we introduce a fake sleep
# to make it slow since the function is actually very quick to eval...
@everywhere function slow_rosenbrock(x)
  sleep(1) # Fake a slower func to be optimized...
    println("evaluation on worker")
  return BlackBoxOptim.rosenbrock(x)
end

# First run without any parallel procs used in eval
#opt1 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
#               NumDimensions = 50, MaxFuncEvals = 5000)
#el1 = @elapsed res1 = bboptimize(opt1)
#t1 = round(el1, digits=3)

# When Workers= option is given, BlackBoxOptim enables parallel
# evaluation of fitness using the specified worker processes
opt2 = bbsetup(slow_rosenbrock; Method=:dxnes, SearchRange = (-5.0, 5.0),
               NumDimensions = 50, MaxFuncEvals = 40, Workers = workers(), lambda = 20)
el2 = @elapsed res2 = bboptimize(opt2)
t2 = round(el2, digits=3)

#println("Time: serial = $(t1)s, parallel = $(t2)s")
#if t2 < t1
#  println("Speedup is $(round(t1/t2, digits=1))x")
#else
#  println("Slowdown is $(round(t2/t1, digits=1))x")
#end

uses in fact 20 workers:

[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
evaluation on worker
Starting optimization with optimizer BlackBoxOptim.DXNESOpt{Float64, RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
σ=1.0 η[x]=1.0 η[σ]=0.0 η[B]=0.0 |tr(ln_B)|=0.0 |path|=NaN speed=NaN
      From worker 13:   evaluation on worker
      From worker 19:   evaluation on worker
      From worker 15:   evaluation on worker
      From worker 21:   evaluation on worker
      From worker 11:   evaluation on worker
      From worker 12:   evaluation on worker
      From worker 18:   evaluation on worker
      From worker 2:    evaluation on worker
      From worker 16:   evaluation on worker
      From worker 20:   evaluation on worker
      From worker 17:   evaluation on worker
      From worker 14:   evaluation on worker
      From worker 10:   evaluation on worker
      From worker 4:    evaluation on worker
      From worker 8:    evaluation on worker
      From worker 9:    evaluation on worker
      From worker 7:    evaluation on worker
      From worker 5:    evaluation on worker
      From worker 6:    evaluation on worker
      From worker 3:    evaluation on worker
2.34 secs, 20 evals, 1 steps, fitness=520500.861006177
σ=0.9660885717136757 η[x]=1.0 η[σ]=1.1666666666666667 η[B]=0.00390625 |tr(ln_B)|=1.2739375526704677e-18 |path|=NaN speed=NaN
      From worker 2:    evaluation on worker

robertfeldt / BlackBoxOptim.jl

parallal eval does not use all available workers #233