Open floswald opened 7 months ago
Hmm, long time since working with that code so not 100% sure here but there is a parameter that decides how many samples are done for each "round" of the NES algorithms. Probably that one might limit how many parallel executions are done. Not fully sure if it's the lambda parameter though, maybe you can try that and later today I can remind myself and see if it would really have an effect.
bingo! 🎉 thanks for the super fast help.
for reference
[oswald@stella01 BlackBoxOptim.jl]$ cat examples/rosenbrock_parallel.jl
using Distributed
# Now add 2 procs that can exec in parallel (obviously it depends on your CPU
# what you actually gain from this though)
addprocs(20, exeflags = "--project=.")
@everywhere using Pkg
@everywhere Pkg.instantiate()
# Ensure BlackBoxOptim loaded on all workers
@everywhere using BlackBoxOptim
# define the function to optimize on all workers. Parallel eval only gives a gain
# if function to optimize is slow. For this example we introduce a fake sleep
# to make it slow since the function is actually very quick to eval...
@everywhere function slow_rosenbrock(x)
sleep(1) # Fake a slower func to be optimized...
println("evaluation on worker")
return BlackBoxOptim.rosenbrock(x)
end
# First run without any parallel procs used in eval
#opt1 = bbsetup(slow_rosenbrock; Method=:xnes, SearchRange = (-5.0, 5.0),
# NumDimensions = 50, MaxFuncEvals = 5000)
#el1 = @elapsed res1 = bboptimize(opt1)
#t1 = round(el1, digits=3)
# When Workers= option is given, BlackBoxOptim enables parallel
# evaluation of fitness using the specified worker processes
opt2 = bbsetup(slow_rosenbrock; Method=:dxnes, SearchRange = (-5.0, 5.0),
NumDimensions = 50, MaxFuncEvals = 40, Workers = workers(), lambda = 20)
el2 = @elapsed res2 = bboptimize(opt2)
t2 = round(el2, digits=3)
#println("Time: serial = $(t1)s, parallel = $(t2)s")
#if t2 < t1
# println("Speedup is $(round(t1/t2, digits=1))x")
#else
# println("Slowdown is $(round(t2/t1, digits=1))x")
#end
uses in fact 20 workers:
[oswald@stella01 BlackBoxOptim.jl]$ cat rosen.out
evaluation on worker
Starting optimization with optimizer BlackBoxOptim.DXNESOpt{Float64, RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
σ=1.0 η[x]=1.0 η[σ]=0.0 η[B]=0.0 |tr(ln_B)|=0.0 |path|=NaN speed=NaN
From worker 13: evaluation on worker
From worker 19: evaluation on worker
From worker 15: evaluation on worker
From worker 21: evaluation on worker
From worker 11: evaluation on worker
From worker 12: evaluation on worker
From worker 18: evaluation on worker
From worker 2: evaluation on worker
From worker 16: evaluation on worker
From worker 20: evaluation on worker
From worker 17: evaluation on worker
From worker 14: evaluation on worker
From worker 10: evaluation on worker
From worker 4: evaluation on worker
From worker 8: evaluation on worker
From worker 9: evaluation on worker
From worker 7: evaluation on worker
From worker 5: evaluation on worker
From worker 6: evaluation on worker
From worker 3: evaluation on worker
2.34 secs, 20 evals, 1 steps, fitness=520500.861006177
σ=0.9660885717136757 η[x]=1.0 η[σ]=1.1666666666666667 η[B]=0.00390625 |tr(ln_B)|=1.2739375526704677e-18 |path|=NaN speed=NaN
From worker 2: evaluation on worker
hello,
I find that I almost never am able to take advantage of all available workers on a cluster. Here I have 20 workers, but the
xnes
algorithm only ever uses 13. I also tried withdxnes
but that uses 14 out of 20. Is there anything in the algo that determines how many workers to use?i tried this: