Open pedm opened 6 years ago
Hmm, unfortunately I have very little experience running this on ClusterManagers but I think you might need to use prelim code in one of our PRs to get this working. Maybe @alyst better knows the current status? We sure want to get this up and running as soon as we have made the move to 0.7/1.0 so would be great to have more detailed test cases and feedback from you at that points @pedm
Ah, sorry for being terse. PR = Pull Requests, we have two different ones related to parallel evaluation during optimization, see:
https://github.com/robertfeldt/BlackBoxOptim.jl/pull/46
and
https://github.com/robertfeldt/BlackBoxOptim.jl/pull/25
The former (46) is closer to a final stage but still needs some work. We probably will prioritize getting a stable version up for 0.7/1.0 as soon as they are out but @alyst knows more about the status of these PRs.
Hi,
I also finds that BlackBoxOptim
does not use all available workers when used on a cluster with ClusterManagers
.
With the example below, only workers 1 - 14 are used. Any idea why it is the case? Thank you.
using Distributed
using ClusterManagers
addWorkers = true
OnCluster = true
n_nodes = 2
n_cores_per_node = 10
maxNumberWorkers = round(Int, n_nodes*n_cores_per_node)
if addWorkers == true
if OnCluster == true && n_nodes > 1
print("Multiple nodes: using SlurmManager")
addprocs(SlurmManager(maxNumberWorkers))
else
print("Single node")
addprocs(maxNumberWorkers)
end
end
@everywhere using Distributed
# Check the way workers are spread on nodes
# (relevant if on a cluster)
#------------------------------------------
hosts = []
pids = []
for i in workers()
host, pid = fetch(@spawnat i (gethostname(), getpid()))
println("Hello I am worker $(i), my host is $(host)")
push!(hosts, host)
push!(pids, pid)
end
# check the number of workers:
#----------------------------
currentWorkers = nworkers()
println("Number of workers = $(currentWorkers)")
@everywhere using BlackBoxOptim
@everywhere function slow_rosenbrock(x)
sleep(0.001) # Fake a slower func to be optimized...
println("I am worker $(myid())")
println("I am worker $(gethostname())")
return BlackBoxOptim.rosenbrock(x)
end
opt = bboptimize(slow_rosenbrock, Method=:dxnes, SearchRange = (-5.0, 5.0),
NumDimensions = 50, MaxFuncEvals = 100000, Workers = workers())
res = best_candidate(opt)
print("Minimizer: $(res)")
print("Best fitness: $(best_fitness(opt))")
julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, nehalem)
Hello,
I am working with BlackBoxOptim on a slurm cluster, using ClusterManager to manage the additional processes. Unfortunately BlackBoxOptim does not run the distance function on the additional nodes.
In this situation, the distance function is only run on the processes on the same node as the master. Do you know if there's any way to fix this? A small sample code is below, containing everything but the distance function. Thank you!